Triangle Computer Science Distinguished Lecturer Series

Data for Good: Data Science at Columbia University with Dr. Jeannette Wing
D106 LSRC, Duke (telecast from UNC)
Monday April 23 2018
4:00PM – 5:00PM
Snacks and coffee will be served.

Abstract:

Every field has data. We use data to discover new knowledge, to interpret the world, to make decisions, and even to predict the future. The recent convergence of big data, cloud computing, and novel machine learning algorithms and statistical methods is causing an explosive interest in data science and its applicability to all fields. This convergence has already enabled the automation of some tasks that better human performance. The novel capabilities we derive from data science will drive our cars, treat disease, and keep us safe. At the same time, such capabilities risk leading to biased, inappropriate, or unintended action. The design of data science solutions requires both excellence in the fundamentals of the field and expertise to develop applications which meet human challenges without creating even greater risk. The Data Science Institute at Columbia University promotes Data for Good: using data to address societal challenges and bringing humanistic perspectives as not after new science and technology is invented. Started in 2012, the Institute is now a university-level institute representing over 300 affiliated faculty from 12 different schools across campus. Data science literally touches every corner of the university. In this talk, I will present the mission of the Institute, highlights of our educational and research activities, and plans for future initiatives.

Biography

Jeannette M. Wing is Avanessians Director of the Data Science Institute and Professor of Computer Science at Columbia University. From 2013 to 2017, she was a Corporate Vice President of Microsoft Research. She is Consulting Professor of Computer Science at Carnegie Mellon where she twice served as the Head of the Computer Science Department and had been on the faculty since 1985. From 2007-2010 she was the Assistant Director of the Computer and Information Science and Engineering Directorate at the National Science Foundation. She received her S.B., S.M., and Ph.D. degrees in Computer Science, all from the Massachusetts Institute of
Technology. Professor Wing’s general research interests are in the areas of trustworthy computing, specification and verification, concurrent and distributed systems, programming languages, and software engineering. Her current interests are in the foundations of security and privacy, with a new focus on trustworthy AI. She was or is on the editorial board of twelve journals, including the Journal of the ACM and Communications of the ACM. She is currently a member of: the National Library of Medicine Blue Ribbon Panel, the Science, Engineering, and Technology Advisory Committee for the American Academy for Arts and Sciences; the Board of Trustees for the Institute of Pure and Applied Mathematics; the Advisory Board for the Association for Women in Mathematics; and the Alibaba DAMO Technical Advisory Board. She has been chair and/or a member of many other academic, government, and industry advisory boards. She received the CRA Distinguished Service Award in 2011 and the ACM Distinguished Service Award in 2014. She is a Fellow of the American Academy of Arts and Sciences, American Association for the Advancement of Science, the Association for Computing Machinery (ACM), and the Institute of Electrical and Electronic Engineers (IEEE).

Host: Ketan Mayer-Patel

Please join us tomorrow afternoon for an engaging talk on Data Science: Myth vs Practice in the Intelligence Community. Details and abstract below. Space in Hunt Library’s IEI 4105 is limited. We hope to see you there!

Data Science: Myth vs Practice in the Intelligence Community

Speaker:

Dr. Oksana Lassowsky

Mathematician/Multi-Disciplined Language Analyst

Europe Division, Technical Director (NSA)

Abstract:

Data and Intelligence Analysis – Myth vs Mission:

This presentation will examine and debunk several myths that at times confound the effective use of Data Science in Intelligence Analysis, drawing on the case studies of both successes and failures The mission case studies illustrate the deep understanding of both computational techniques as well as of the intelligence subject matter required to build compelling, actionable intelligence. Thus for full mission impact, genuine collaboration between the domains is critical throughout the whole data science process, from iteratively formulating a computational approach, to the meaningful interpretation of every stage of results. Drawing on myriad analytic efforts, NSA developed and offers two counterpart courses – “Data Science for Intelligence Analysts”(for the mission analysts) and “Data Science Fundamentals” (for the scientists),and a workshop for leadership “Data Science Mission Strategy”. The foundational courses help each role respectively to engage in the kind of effective cross-discipline collaboration that will get the most out of the data and the science in answering intelligence mission needs. In support of applying data science, however, more robust data engineering, coherent and consistent analytic tool designs, compatible platforms, and clarity in compliance remain critical needs. In particular, developing and implementing a genuine plain English information architecture that governs all tool and navigation design is crucial.

When
Tue Mar 27, 2018 14:00 – 15:00 Eastern Time
Where
James B. Hunt Jr. Library- IEI 4105 (4th Floor) (map)

Democratizing Large-Scale Data and
Machine Learning in Materials Research

Bryce Meredig, Ph.D.

Dr. Bryce Meredig
Chief Science Officer
Citrine Informatics
Friday, September 22nd at 11:00am – EB1 – 1011

Abstract:
Over the first five years of the Materials Genome Initiative (MGI), the materials community has gained new appreciation for the enormous potential of digital data in the research enterprise. Nonetheless, the fact remains that the vast majority of materials research data is neither widely accessible nor readily computable (i.e., amenable to statistical analyses and machine learning). If we characterize the first five years of the MGI as the prerequisite infrastructure-building period, we anticipate that the next five years of MGI will be marked by rapid proliferation of machine learning within materials research, and a concomitant flourishing of newly enabled, data-driven discoveries. Our vision at Citrine is for our Open Citrination platform to democratize these benefits of data-driven research across the materials community.

Bio:
Dr. Meredig’s research interest is the application of machine learning to materials science. He earned his Ph.D. in Materials Science from Northwestern University, where he focused on materials informatics, and his BAS and MBA at Stanford University, where he is also on the faculty of the Department of Materials Science and Engineering. He is the author of over 20 peer-reviewed publications and regularly gives invited talks at materials conferences including MRS, TMS, and MS&T, as well as plenaries and keynotes at workshops focused on data-driven materials research. Dr. Meredig was an Arjay Miller Scholar and Terman Fellow at Stanford and a Presidential Fellow and NDSEG Fellow at Northwestern.

Data-Matters-Header-2017-other-full-wave
From August 7-11, 2017, over 190 individuals took part in Data Matters™ 2017 in Hunt Library on
NCSU’s Centennial Campus. Data Matters™ is a week-long series of one and two-day courses aimed at professionals in business, academics, researchers, non-profits, and government personnel. Courses are designed for all types of skill levels from beginners in data science struggling to stay afloat in the data deluge to data science practitioners who are regularly grappling with large, complex data. Other
attendees simply want to sharpen your data science skills. Course topics include Intro to Data Science
Using R, Effective Information Visualization, Programming in R, Data Curation, and Working with Messy Data among others. Fifteen courses are taught throughout the week and this year, 6 of the 15 were sold out or even over capacity.
“Attendees at Data Matters™ 2017 came to NC State from across the United States and around the
world,” said Jamie Roseborough, Program Manager for the Data Science Initiative at NC State University. “We have individuals from Arizona, Ohio, Pennsylvania, North and South Carolina, Virginia, and even one gentleman came from Delhi, India!” NCSU partners with both UNC Chapel Hill’s Renaissance Computer Institute (Renci) and The Odum Institute, as well as the National Consortium for Data Science (NCDS). “We have an outstanding partnership with our colleagues at Renci, Odum, and NCDS to develop and execute Data Matters each year. We strive to keep our overhead costs low so that our registration fees are reasonable and manageable for the entire week,” said Roseborough. Feedback is gathered throughout the week to assist in changes or the development of new courses for next year’s Data Matters™. Roseborough concluded, “Many attendees are working with data in very different ways, but one common theme we are seeing is a broad interest in data visualization. As we look towards next August, we will take into this need and either expand course offerings in visualization or add a more advanced course or two to allow folks to expand their skillset and build off this year’s training.”
All information pertaining to Data Matters™ 2017 and updates on the planning for Data Matters™ 2018 can be found at datamatters.org or by connecting with Jamie Roseborough at jvrosebo@ncsu.edu.
If you have any questions or need any additional information, please contact Jamie Roseborough, DSI Program Manager, at either 919-515-7320 or jvrosebo@ncsu.edu.

NC State has been awarded a $414,000 grant from the Andrew W. Mellon Foundation to support the advancement of tools and techniques for developing and sharing large-scale visual content for research.

Entitled “Visualizing Digital Scholarship in Libraries and Learning Spaces,” the project aims to continue the NCSU Libraries’ pioneering work with large-scale, research visualization technologies. According to Greg Raschke, associate director for collections and scholarly communication at the Libraries and one of the project’s principal investigators, in order to move forward with this work, two issues need to be addressed.

Read More

 ‘NC in the Next Tech Tsunami: Navigating the Data Economy’

North Carolina’s Leadership at Stake as the ‘Data Economy’ Rises. The National Consortium for Data Science (NCDS) and the N.C. Board of Science, Technology & Innovation recommends action to position state for success in the new economic environment.

Raleigh, N.C. – Closer collaboration, proactive branding, and a greater focus on data science education and talent development will propel North Carolina to the top of the emerging data economy, according to a new report published today by the North Carolina Board of Science, Technology & Innovation. Leadership in the data economy is becoming increasingly critical as more and more economic value is based on the ability to successfully collect and manipulate data for insight and profit. Read the rest of the Press Release here.

Access and download the report here.

From Outlier to Insider — A Networking Event with Data Science Professionals

The National Consortium for Data Science (NCDS) and the Kenan-Flagler Business School at UNC-Chapel Hill will host a networking networking event for students interested in careers in analytics and data science.
NCDS members and other employers who attend the networking event will have the chance to interact with and potentially recruit top-tier analytics and data science talent.
Student attendees will have a unique opportunity to chat one-on-one with company representatives and learn about job opportunities and skill sets needed for success in data analytics fields.
Tuesday, February 7
6:00-9:00 PM
The Frontier – Home Base, Research Triangle Park
There is  no cost to students or employers to attend this event.
To Register:
bigdatacareer.eventbrite.com

 

NC State Hosts NSF Big Data Research Center

 

NC State University has been selected as a new site of the existing Center for Hybrid Multicore Productivity Research (CHMPR). The new NC State site, which opened September 1, is hosted in the Laboratory for Science of Technologies for End-to-End Enablement of Data in the College of Engineering’s Department of Computer Science.

“I am very excited about joining CHMPR, because this happens to be a range of my personal research interests,” said Rada Chirkova, associate professor of computer science and principal investigator for the site. “It’s exciting to be given this great opportunity to work on something you’ve always been interested in.”

The research challenges addressed by CHMPR center around big-data analytics. The new site’s main goal will be to conduct trans-disciplinary translational science and research of enabling better decision making in presence of big data.

CHMPR is part of the Industry–University Cooperative Research Centers Program (IUCRC) supported by the National Science Foundation, which enables industrially relevant, pre-competitive research via multi-member, sustained partnerships between industry, academe, and government. Centers bring together faculty and students from academic institutions with companies, state/federal/local government, and non-profits, to perform cutting-edge pre-competitive fundamental research in science, engineering, and technology areas that are of interest to industry and that can drive innovation and the U.S. economy. Members guide the direction of center research through active involvement and mentoring.

The CHMPR/NC State effort toward end-to-end enablement of data will focus on developing technologies and tools for bridging the time gap between the acquisition of data and real-time and long-term decision making.

“One very unique thing that is happening is data wrangling – this is really how you prepare data and how you put the data together for your big-data analytics. This is a well-known really hard problem out in the industry,” Chirkova said. “That is a very time-consuming, labor-intensive and expensive part of the process. So whatever can be automated will be welcomed by the industry and government. We try to specialize in this, but will also do the full range of big-data analytics.”

The fundamental research conducted in the CHMPR center is also expected to translate into technologies that will aid industry, federal agencies and government agencies. The delivery of practical solutions to difficult problems will aid in precompetitive research, and will help create forward-oriented opportunities for industry and government.

According to Chirkova, the work being done at NC State complements the research being done in the Center. “At NC State, we are a premier place for data science and big-data analytics; it’s great to know NC State is an asset many people know about.”

CHMPR comprises the University of Maryland, Baltimore County; the University of California, San Diego; the University of Utah; Rutgers University; and, now, NC State. The overall CHMPR program is in the second phase of IUCRC support.

For more information on CHMPR and the semiannual meeting, visit: https://www.steed.ncsu.edu/chmpr-planning-meeting/.

Andrew Odewahn

NC State Libraries Workshops

Andrew Odewahn

Chief Technology Officer
O’Reilly Media


Narrated Learning Experience: Blending Code, Data, Text, and Video

WHEN


2:00pm – 3:00pm

WHERE

EVENT DESCRIPTION

Andrew Odewahn, Chief Technology Officer at O’Reilly Media, will demonstrate a suite of tools, including GitHub, Docker, and Jupyter, and how they are being used in O’Reilly Media’s publishing workflows. Andrew will also present new projects geared towards blending code, data, text, and video into a narrated learning experience with executable content.

This is a bring your own laptop event.

ADMISSION INFORMATION

The event is free and open to the public but registration is required. Please click HERE to register.

“git push” as the Future of Publishing

WHEN


11:00am – 12:00pm

WHERE

EVENT DESCRIPTION

Many organizations have a huge opportunity to take advantage of a new generation of open source analytical and data visualization tools.  However, because they’re often built by hackers for hackers, these tools usually rely on a complex system of dependencies and concepts that can make them much more difficult to adopt than purpose-built tools from proprietary vendors.  So while the benefits in terms of cost savings and rapid innovation are huge, reaping the rewards often requires organizations to rethink processes and workflows as a software process.  For almost 5 years, O’Reilly Media has centered its publishing processes around tools like Jupyter, git, GitHub, Docker, and a host of open source packages.

In this talk, Andrew Odewahn, Chief Technology Officer at O’Reilly Media, will talk about some of the opportunities and challenges encountered in making the shift from traditional media to software development.

ADMISSION INFORMATION

The event is free and open to the public.

2016 Triangle Statistics Genetics Conference

Monday, October 31st
Theme: “The Genome and the Environment”

Time

9:20 – 4:00
8:45 Check-In & Continental Breakfast

Location

Executive Briefing Center, SAS Campus
Lunch Provided

 

Speakers

Pierre Bushel (NIEHS)
Denis Fourches (NCSU)
Elizabeth Jensen (Wake Forest U.)
Carol Hamilton (RTI)
Michael Love (UNC)
Kelci Miclaus (SAS)
Sayan Mukherjee (Duke)
Ellie Rahbar (Wake Forest U.)
David Reif (NCSU)
Praveen Sethupathy (UNC)
ClarLynda Williams (NCCU)
Yi-Hui Zhou (NCSU)

Organizers

Fred Wright (NCSU, chair), Andrew Allen (Duke),
Carl Langefeld (WFU), Yun Li (UNC),
Kelci Miclaus (SAS), Jung-Ying Tzeng (NCSU),
Liling Warren (Acclarogen)

Register here by October 26.  Limited to the first 145 registrants.

Cognitive and Data Sciences Education Workshop

Sunday, October 23rd 2016, 9am-7pm,
Four Seasons Ballroom, Las Vegas NV

IEEE Big Data Initiative, the Business-Higher Education Forum, and IBM are partnering to host a one-day workshop on Sunday, October 23, just before World of Watson. The workshop is suited for academics responsible for artificial intelligence, machine learning, cognitive science, citizen analyst, data science, or data engineering curriculum and programs.

Many schools have moved to establish specialist data science programs, but is that focus enough to meet the rapidly changing needs of employers?

Join the conversation on October 23rd to engage with academic and industry leaders. Learn how others are driving the evolution.  Questions on the table for the day include:

  • How will these emerging fields evolve?
  • What competencies will every student require?
  • What competencies will specialists require?
  • What competencies will business expect students acquire?

If you are responsible for curriculum and program design, or for building the talent pool for your organization this event is tailor made for you. Learn how to outthink the skills gap.

This special event immediately precedes World of Watson in Las Vegas which runs from October 24th to 27th.

Keynote Speakers:

  • The future of Artificial Intelligence
    by Michael Karasick, VP Cognitive Computing, IBM Research
  • How should education evolve with cognitive and data science?
    Closing panel hosted by Guru Banavar, Chief Science officer for Cognitive Computing, IBM Research

Hands-on workshops with emerging cognitive and data science technology

Data Science:

  • Empower every student to have smarter data driven decision making
  • Making data science a team sport

Cognitive Science:

  • Welcome to the cognitive era; a hands-on with Watson services
  • The new internet of cognitive data driven things

Invited Talks

Data Science Education

  • Civic data as the material for learning: bridging the gap from the 4 year degree
    Catherine Nikolovski, Hack Oregon
  • The data science enabled professional
    Brian Fitzgerald, Business-Higher Education Forum
  • Building the data science profession
    Yuri Demchenko, University of Amsterdam
  • Data Science for Every Student at RPI
    Peter Fox, Rensselaer Polytechnic Institute (30 minutes)

Cognitive Science Education

  • Merging Tech & Marketing in a Cognitive World
    Randy Hvalac, Northwestern
  • Next generation cognitive curriculum
    Jim Spohrer, IBM
  • The nexus of society, data and cognitive sciences: Driving the evolution of curriculum
    Nitesh Chawla, Notre Dame
  • Co-Innovation Strategies for Empowering the Bottom of the Pyramid
    Solomon Darwin UC Berkeley
  • Cognitive as an enabler of tech entrepreneurship in Saudi Arabia
    Artemisa Jaramillo, Princess Noura University
  • Experiences integrating Watson into my courses
    Gordon Pipa, Universität Osnabrück

Cognitive & Data Science Research

    • A Transdisciplinary Holodeck for Research, Education, and Innovation
      Winslow Burleson, New York University
    • The power of Cognitive Computing for the Internet of Things

Eleni Pratsini, IBM Research

  • Computational Argumentation
    Aya Soffer, IBM Research

Want to learn more about the future of cognitive and data sciences?

Spend the week at World of Watson following the workshop.

Join an elite group of thought leaders, top academics, inspired architects, data scientists, developers, engineers, inventors and business leaders to explore the future of artificial intelligence, cognitive systems, and data science.

Register today. Qualified academics receive special conference pricing.

ibm.biz/IBMWoWReg

Sponsors:

  • IBM
  • IEEE Big Data
  • Business Higher Education Forum

Computing Research and the Emerging Field of Data Science 

The Computing Research Association released a bulletin on the emerging field of Data Science on October 7, 2016.  Excepts are as follows:

By CRA’s Committee on Data Science: Lise Getoor (Chair), David Culler, Eric de Sturler, David Ebert, Mike Franklin, and H.V. Jagadish on behalf of the CRA Board

Our ability to collect, manipulate, analyze, and act on vast amounts of data is having a profound impact on all aspects of society.  This transformation has led to the emergence of data science as a new discipline[1].  The explosive growth of interest in this area has been driven by research in social, natural, and physical sciences with access to data at an unprecedented scale and variety, by industry assembling huge amounts of operational and behavioral information to create new services and sources of revenue, and by government, social services and non-profits leveraging data for social good.   This emerging discipline relies on a novel mix of mathematical and statistical modeling, computational thinking and methods, data representation and management, and domain expertise.   While computing fields already provide many principles, tools and techniques to support data science applications and use cases, the computer science community also has the opportunity to contribute to the new research needed to further drive the development of the field.  In addition, the community has the obligation to engage in developing guidelines for the responsible use of data science.

Data science starts with a strong set of foundations adapted from several fields including statistics, mathematics, social science, natural sciences, and computer science.   Already, virtually all aspects of traditional computer science research have played a role in the development of data science.  And looking forward, data science will drive fundamentally new computing research.

  1. From a data management perspective, data science requires a much deeper understanding and representation of how datais acquired, stored and accessed.  Data lineage, data quality, quality assurance, data integration, storage, privacy, and security all need to be rethought. The traditional approach of acquisition, followed by storage, and processing often does not work for high rate or sensitive data.
  2. From a computational point of view, very large data volumes, very high data rates, and very large numbers of users, demand new systems and new algorithms.   New system architectures that can accommodate the heterogeneity and irregular structure in data access and communication are needed.   From an algorithmic perspective, there is a need for sublinear algorithms, online algorithms that support real-time data streams, and probabilistic and stochastic approaches to accommodate both scale and noise in the data.
  3. Furthermore, many classic statistical assumptions and machine learning techniques do not fit current data science needs.   Often derived from natural sources, data is increasingly likely to be biased, incomplete and highly heterogeneous. Systematic errors arising in automated data collection and semantic inconsistencies that result from stitching data together from multiple sources across longer time horizons present profound modeling challenges and opportunities for the development of new statistical methods and machine learning algorithms.   Even in the small data setting, new techniques that can cope with heterogeneity and biased sampling are needed.  While predictive modeling is important, many data science problems involve decision making, and the ability to reason about alternate courses of action is needed.  In addition, understanding the curse of dimensionality, overfitting, and causality in these complex settings is critical.
  4. The challenges in scale and heterogeneity also fundamentally change how users interact with data and models, how the data is visualized, what algorithms are needed to support understanding and interpretation of the results of data science models, how decisions are made, and how user feedback is acquired and incorporated. Human computer interaction and visual analytics will need to be more tightly integrated with data science models and algorithms. New use cases for natural language processing, speech, computer vision and other human-machine communication modes will emerge.
  5. Because data science systems are often embedded in operational systems with changing demands and distributions,supporting the entire data science lifecycle is important.  Ensuring the robustness of all aspects of the pipeline is important.  New software engineering and computer programming best practices will need to be developed.   Additionally, data artifacts will often persist beyond their initially planned usage, so longer-term curation and management must also be addressed.

The 4th International Workshop on Big Data and Social Networking Management and Security

December 5-7, 2016, Barcelona, Spain

Call for Papers:

Data has always been recognized as an important asset for driving value be it scientific, governmental or for enterprise. The amount of data being generated constantly with high volume, velocity and variety has marked the emergence of “Big Data” as a contemporary research challenge with vast opportunities. In the context of the digital-world, the internet and Social Networks have contributed immensely in generating Big Data pools. Hundreds of millions of people around the globe are nowadays connected using different type of social networks. Big Data and Social Networks become interrelated in the modern computing agenda being both concerned with intensive data. As much opportunities they hold, Big Data and Social Networks come with their own challenges such as management, Security, Processing etc. In order to unlock the potentials of these technologies, innovative solutions are required, which will leverage new models in computing. The 4th International workshop on Big Data and Social Networking Management and Security will be a forum for scientists, researchers, students, and practitioners to present their latest research results, ideas, developments, and applications in the areas of big data and social networking. We are mainly interested in receiving state of the art work on different aspect of big data management system, big data security and privacy, cloud computing and big data, social networking and big data analytics, and social networking management and security, to mention but few.

The topics of interest for this workshop include, but are not limited to:

  • Big data and Social Networking concepts and applications
  • Emerging technologies in Big Data and Social Networking
  • Management Issues of Social Network Big Data
  • Security challenges in Big Data and Social networks
  • Social Network and Big Data Analytics
  • Open Source tools for Big Data
  • Green Computing for Big Data
  • Network Infrastructure for Social Networking and Big Data
  • Social Networks Monitoring Tools As a Service
  • Cloud Computing for Big Data and Social Networks
  • Big Data and the Internet of Things
  • Big Data Management
  • Big Data and Decision Making
  • Visualization tools for Big Data
  • Mobile Cloud networks and Big Data
  • Social network data analysis tools and services on the Cloud
  • Case studies

Important Dates

  • Paper Submission: September 15, 2016
  • Author’s Notification: October 15, 20156
  • Camera Ready and Registration Deadline: October 30, 2016

Instructions for Authors

Prospective authors are invited to submit full papers of up to 6 pages, strictly following the IEEE Proceedings Templates.  Submitted papers will be peer-reviewed and prospective authors are expected to present their papers at the conference.  The papers that are accepted and presented at the conference will appear in conference proceedings.  At least one author of each accepted submission must attend the workshop.

Please submit your paper in PDF format via the electronic submission system

Please send any inquiry on BDSN 2016 to Mohammad AL-Smadi : smadi.mohammad@gmail.com   and Yaser Jararweh : yijararweh@just.edu.jo

Workshop on Distributed and Parallel Data Analysis (DPDA)

When: September 21-23, 2016
Where: Statistical and Applied Mathematical Sciences Institute (SAMSI), 19 T.W. Alexander Drive, Research Triangle Park, NC 27709-4006

Description

The workshop aims to bring academic researchers and industrial engineers together for the exploration and scientific discussions on recent challenges faced by practitioners and related theories and proven best practices in both academia and industries on distributed data analytics.

In recent works of computational mathematics and machine learning, great strides have been made in distributed optimization and distributed learning. For example, using ‘consensus’ on local variables and global variable, the Alternating Direction Method of Multipliers (ADMM) algorithm can be utilized to solve a distributed version of the LASSO problem. On the other hand, classical statistical methodology, theory, and computation are based on the assumption that the entire data are available at a central location; this is a significant shortcoming in modern problem solving. It is known that computing speed at a single machine can be thousands time faster than the data transmission between locations.

Specific goals of the workshop include (i) exposing academic researchers to both the challenges in industrial applications and current computing tools being used in industry, (ii) introducing industrial researchers to the frontiers of applied mathematical and statistical  methods regarding distributed inference, and (iii) educating graduate students and early-career researchers about practical computing and theoretical studies in distributed analytics. The workshop will begin with few tutorial type lectures followed by lectures and panels on state-of-the-art research based methods by leading researchers and practitioners in this emerging field of mathematics.

The workshop will be limited to about 50 participants and funding support priority will be given to U.S. based researchers.

Apply to participate in this workshop

Questions: email dpda@samsi.info

Refining the Concept of Scientific Inference When Working With Big Data: A Workshop

WHEN: WHERE: Keck Center – 500 Fifth St. NW Room 100, Washington, D.C. 20001

The Committee on Applied and Theoretical Statistics invites you to attend a two-day workshop on the challenges of applying scientific inference to big data. The workshop will bring together statisticians, data scientists and domain researchers from different biomedical disciplines to explore four key issues of scientific inference:

  • Inference about causal discoveries driven by large observational data
  • Inference about discoveries from data on large networks
  • Inference about discoveries based on integration of diverse datasets
  • Inference when regularization is used to simplify fitting of high-dimensional models

The aim of the workshop is to identify new developments that hold significant promise and to highlight potential research program areas for the future.  Please contact Michelle Schwalbe at mschwalbe@nas.edu with any questions.

View the preliminary workshop agenda.

Register for the workshop

Workshop of Mission-Critical Big Data Analytics (MCBDA 2016)

The first edition of Mission-Critical Big Data Analytics workshop (MCBDA 2016) is going to take place on May 16-17, 2016 at Prairie View A&M University, Prairie View, TX, USA. This two-day workshop will cover the state-of-the-art research and development in big data analytics, especially for mission-critical applications. There will be keynote speech by invited renowned speakers, and tutorial sessions with hands-on training. Research works of cutting edge research topics in big data will be presented, plus a poster session and a demo session where students will present their works.

Registration fee is $200, including all the technical sessions, USB proceedings, workshop brochure and bag, two lunches on May 16 and 17, the welcome reception in the evening of May 16, and a tour of the Prairie View A&M University campus. Registration is open: Click here

PyData Carolinas 2016

September 14-16, 2016
Hosted by IBM Emerging Technologies
Research Triangle Park, NC

What is PyData?

PyData conferences bring together users and developers of data analysis tools to share ideas and learn from each other. The PyData community gathers to discuss how best to apply Python tools, as well as tools using R and Julia, to meet evolving challenges in data management, processing, analytics, and visualization.  We aim to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person.

Call for Proposals

The event brings together analysts, scientists, developers, engineers, architects and others from the data science community to discuss new techniques and tools for management, analytics and visualization of data. PyData welcomes presentations focusing on Python as well as other languages used in data science (e.g. R, Julia). Presentation content can be at a novice, intermediate or advanced level.

Talks will run 30-40 minutes each (45 minute slots including time for questions) with 30 available talk slots. Twelve tutorials are planned, six running 90 minutes each, and six running 2 hours each. As a reminder, PyData presentations are intended to share knowledge and experience. We welcome talks letting attendees know how you are using tools in your work, but discourage any proposals with the aim of selling a product.

If you are interested in presenting a talk or tutorial, we encourage your submission(s). To see the type of topics presented at previous PyData events, please look at our past conference sites at pydata.org or check out the videos on YouTube or Vimeo.