Free Course on Learning to Use the NCSU Blade Center
- This free short course will meet for 5 hours total (over 2 sessions) on Thursday and Friday mornings of Fall break. (Thursday, October 5, 9:30 to 12:00, and Friday, October 6 9:30 to 12:00).
- The 2 session short course will be held in the AFH-108-OIT-Training Lab. Parking permits are not required during break. AFH is on Avent Ferry Rd. across from the Misson Valley Mall, the lower building south of the high rise dorms. Here’s a link to an online campus map.
- E-mail (Gary Howell, email@example.com) in advance to ensure space availability.
- Graduate students, postdocs, faculty and staff who are likely to use parallel computation in research projects or theses are particularly invited. Since the class is on how to get started using the BladeCenter, getting a BladeCenter account before the class will help you. Faculty can start projects and add students to them from the web page https://projects.ncsu.edu/hpc/Accounts/Accounts.php
The first day of the class will cover how to use the Blade Center henyr2 cluster, with examples of how to compile and run simple jobs. The second day will be designed to help users port codes they need and course materials are on using Configure and Make are still in preparation. This module will focus on how to find libraries in nonstandard locations, to verify they are the needed libraries, and to link to them.
Class notes for the first day can be downloaded from https://projects.ncsu.edu/hpc//Courses/Courses.php, where you can also find some sample codes.
Before class starts, students who do not already have a Blade Center account are encouraged to have their advisers request them so they can have a permanent account. Faculty can request accounts for themselves and for their students online from http://www.ncsu.edu/itd/hpc/About/Contact.php
Democratizing Large-Scale Data and
Machine Learning in Materials Research
Dr. Bryce Meredig
Chief Science Officer
Friday, September 22nd at 11:00am – EB1 – 1011
Over the first five years of the Materials Genome Initiative (MGI), the materials community has gained new appreciation for the enormous potential of digital data in the research enterprise. Nonetheless, the fact remains that the vast majority of materials research data is neither widely accessible nor readily computable (i.e., amenable to statistical analyses and machine learning). If we characterize the first five years of the MGI as the prerequisite infrastructure-building period, we anticipate that the next five years of MGI will be marked by rapid proliferation of machine learning within materials research, and a concomitant flourishing of newly enabled, data-driven discoveries. Our vision at Citrine is for our Open Citrination platform to democratize these benefits of data-driven research across the materials community.
Dr. Meredig’s research interest is the application of machine learning to materials science. He earned his Ph.D. in Materials Science from Northwestern University, where he focused on materials informatics, and his BAS and MBA at Stanford University, where he is also on the faculty of the Department of Materials Science and Engineering. He is the author of over 20 peer-reviewed publications and regularly gives invited talks at materials conferences including MRS, TMS, and MS&T, as well as plenaries and keynotes at workshops focused on data-driven materials research. Dr. Meredig was an Arjay Miller Scholar and Terman Fellow at Stanford and a Presidential Fellow and NDSEG Fellow at Northwestern.
NCSU’s Centennial Campus. Data Matters™ is a week-long series of one and two-day courses aimed at professionals in business, academics, researchers, non-profits, and government personnel. Courses are designed for all types of skill levels from beginners in data science struggling to stay afloat in the data deluge to data science practitioners who are regularly grappling with large, complex data. Other
attendees simply want to sharpen your data science skills. Course topics include Intro to Data Science
Using R, Effective Information Visualization, Programming in R, Data Curation, and Working with Messy Data among others. Fifteen courses are taught throughout the week and this year, 6 of the 15 were sold out or even over capacity.
world,” said Jamie Roseborough, Program Manager for the Data Science Initiative at NC State University. “We have individuals from Arizona, Ohio, Pennsylvania, North and South Carolina, Virginia, and even one gentleman came from Delhi, India!” NCSU partners with both UNC Chapel Hill’s Renaissance Computer Institute (Renci) and The Odum Institute, as well as the National Consortium for Data Science (NCDS). “We have an outstanding partnership with our colleagues at Renci, Odum, and NCDS to develop and execute Data Matters each year. We strive to keep our overhead costs low so that our registration fees are reasonable and manageable for the entire week,” said Roseborough. Feedback is gathered throughout the week to assist in changes or the development of new courses for next year’s Data Matters™. Roseborough concluded, “Many attendees are working with data in very different ways, but one common theme we are seeing is a broad interest in data visualization. As we look towards next August, we will take into this need and either expand course offerings in visualization or add a more advanced course or two to allow folks to expand their skillset and build off this year’s training.”
NC State has been awarded a $414,000 grant from the Andrew W. Mellon Foundation to support the advancement of tools and techniques for developing and sharing large-scale visual content for research.
Entitled “Visualizing Digital Scholarship in Libraries and Learning Spaces,” the project aims to continue the NCSU Libraries’ pioneering work with large-scale, research visualization technologies. According to Greg Raschke, associate director for collections and scholarly communication at the Libraries and one of the project’s principal investigators, in order to move forward with this work, two issues need to be addressed.
North Carolina’s Leadership at Stake as the ‘Data Economy’ Rises. The National Consortium for Data Science (NCDS) and the N.C. Board of Science, Technology & Innovation recommends action to position state for success in the new economic environment.
Raleigh, N.C. – Closer collaboration, proactive branding, and a greater focus on data science education and talent development will propel North Carolina to the top of the emerging data economy, according to a new report published today by the North Carolina Board of Science, Technology & Innovation. Leadership in the data economy is becoming increasingly critical as more and more economic value is based on the ability to successfully collect and manipulate data for insight and profit. Read the rest of the Press Release here.
Access and download the report here.
NC State Hosts NSF Big Data Research Center
NC State University has been selected as a new site of the existing Center for Hybrid Multicore Productivity Research (CHMPR). The new NC State site, which opened September 1, is hosted in the Laboratory for Science of Technologies for End-to-End Enablement of Data in the College of Engineering’s Department of Computer Science.
“I am very excited about joining CHMPR, because this happens to be a range of my personal research interests,” said Rada Chirkova, associate professor of computer science and principal investigator for the site. “It’s exciting to be given this great opportunity to work on something you’ve always been interested in.”
The research challenges addressed by CHMPR center around big-data analytics. The new site’s main goal will be to conduct trans-disciplinary translational science and research of enabling better decision making in presence of big data.
CHMPR is part of the Industry–University Cooperative Research Centers Program (IUCRC) supported by the National Science Foundation, which enables industrially relevant, pre-competitive research via multi-member, sustained partnerships between industry, academe, and government. Centers bring together faculty and students from academic institutions with companies, state/federal/local government, and non-profits, to perform cutting-edge pre-competitive fundamental research in science, engineering, and technology areas that are of interest to industry and that can drive innovation and the U.S. economy. Members guide the direction of center research through active involvement and mentoring.
The CHMPR/NC State effort toward end-to-end enablement of data will focus on developing technologies and tools for bridging the time gap between the acquisition of data and real-time and long-term decision making.
“One very unique thing that is happening is data wrangling – this is really how you prepare data and how you put the data together for your big-data analytics. This is a well-known really hard problem out in the industry,” Chirkova said. “That is a very time-consuming, labor-intensive and expensive part of the process. So whatever can be automated will be welcomed by the industry and government. We try to specialize in this, but will also do the full range of big-data analytics.”
The fundamental research conducted in the CHMPR center is also expected to translate into technologies that will aid industry, federal agencies and government agencies. The delivery of practical solutions to difficult problems will aid in precompetitive research, and will help create forward-oriented opportunities for industry and government.
According to Chirkova, the work being done at NC State complements the research being done in the Center. “At NC State, we are a premier place for data science and big-data analytics; it’s great to know NC State is an asset many people know about.”
CHMPR comprises the University of Maryland, Baltimore County; the University of California, San Diego; the University of Utah; Rutgers University; and, now, NC State. The overall CHMPR program is in the second phase of IUCRC support.
For more information on CHMPR and the semiannual meeting, visit: https://www.steed.ncsu.edu/chmpr-planning-meeting/.
NC State Libraries Workshops
Chief Technology Officer
Andrew Odewahn, Chief Technology Officer at O’Reilly Media, will demonstrate a suite of tools, including GitHub, Docker, and Jupyter, and how they are being used in O’Reilly Media’s publishing workflows. Andrew will also present new projects geared towards blending code, data, text, and video into a narrated learning experience with executable content.
This is a bring your own laptop event.
Many organizations have a huge opportunity to take advantage of a new generation of open source analytical and data visualization tools. However, because they’re often built by hackers for hackers, these tools usually rely on a complex system of dependencies and concepts that can make them much more difficult to adopt than purpose-built tools from proprietary vendors. So while the benefits in terms of cost savings and rapid innovation are huge, reaping the rewards often requires organizations to rethink processes and workflows as a software process. For almost 5 years, O’Reilly Media has centered its publishing processes around tools like Jupyter, git, GitHub, Docker, and a host of open source packages.
In this talk, Andrew Odewahn, Chief Technology Officer at O’Reilly Media, will talk about some of the opportunities and challenges encountered in making the shift from traditional media to software development.
The event is free and open to the public.
2016 Triangle Statistics Genetics Conference
Monday, October 31st
Theme: “The Genome and the Environment”
9:20 – 4:00
8:45 Check-In & Continental Breakfast
Executive Briefing Center, SAS Campus
Pierre Bushel (NIEHS)
Denis Fourches (NCSU)
Elizabeth Jensen (Wake Forest U.)
Carol Hamilton (RTI)
Michael Love (UNC)
Kelci Miclaus (SAS)
Sayan Mukherjee (Duke)
Ellie Rahbar (Wake Forest U.)
David Reif (NCSU)
Praveen Sethupathy (UNC)
ClarLynda Williams (NCCU)
Yi-Hui Zhou (NCSU)
Fred Wright (NCSU, chair), Andrew Allen (Duke),
Carl Langefeld (WFU), Yun Li (UNC),
Kelci Miclaus (SAS), Jung-Ying Tzeng (NCSU),
Liling Warren (Acclarogen)
Register here by October 26. Limited to the first 145 registrants.
Sunday, October 23rd 2016, 9am-7pm,
Four Seasons Ballroom, Las Vegas NV
IEEE Big Data Initiative, the Business-Higher Education Forum, and IBM are partnering to host a one-day workshop on Sunday, October 23, just before World of Watson. The workshop is suited for academics responsible for artificial intelligence, machine learning, cognitive science, citizen analyst, data science, or data engineering curriculum and programs.
Many schools have moved to establish specialist data science programs, but is that focus enough to meet the rapidly changing needs of employers?
Join the conversation on October 23rd to engage with academic and industry leaders. Learn how others are driving the evolution. Questions on the table for the day include:
- How will these emerging fields evolve?
- What competencies will every student require?
- What competencies will specialists require?
- What competencies will business expect students acquire?
If you are responsible for curriculum and program design, or for building the talent pool for your organization this event is tailor made for you. Learn how to outthink the skills gap.
This special event immediately precedes World of Watson in Las Vegas which runs from October 24th to 27th.
- The future of Artificial Intelligence
by Michael Karasick, VP Cognitive Computing, IBM Research
- How should education evolve with cognitive and data science?
Closing panel hosted by Guru Banavar, Chief Science officer for Cognitive Computing, IBM Research
Hands-on workshops with emerging cognitive and data science technology
- Empower every student to have smarter data driven decision making
- Making data science a team sport
- Welcome to the cognitive era; a hands-on with Watson services
- The new internet of cognitive data driven things
Data Science Education
- Civic data as the material for learning: bridging the gap from the 4 year degree
Catherine Nikolovski, Hack Oregon
- The data science enabled professional
Brian Fitzgerald, Business-Higher Education Forum
- Building the data science profession
Yuri Demchenko, University of Amsterdam
- Data Science for Every Student at RPI
Peter Fox, Rensselaer Polytechnic Institute (30 minutes)
Cognitive Science Education
- Merging Tech & Marketing in a Cognitive World
Randy Hvalac, Northwestern
- Next generation cognitive curriculum
Jim Spohrer, IBM
- The nexus of society, data and cognitive sciences: Driving the evolution of curriculum
Nitesh Chawla, Notre Dame
- Co-Innovation Strategies for Empowering the Bottom of the Pyramid
Solomon Darwin UC Berkeley
- Cognitive as an enabler of tech entrepreneurship in Saudi Arabia
Artemisa Jaramillo, Princess Noura University
- Experiences integrating Watson into my courses
Gordon Pipa, Universität Osnabrück
Cognitive & Data Science Research
- A Transdisciplinary Holodeck for Research, Education, and Innovation
Winslow Burleson, New York University
- The power of Cognitive Computing for the Internet of Things
- A Transdisciplinary Holodeck for Research, Education, and Innovation
Eleni Pratsini, IBM Research
- Computational Argumentation
Aya Soffer, IBM Research
Want to learn more about the future of cognitive and data sciences?
Spend the week at World of Watson following the workshop.
Join an elite group of thought leaders, top academics, inspired architects, data scientists, developers, engineers, inventors and business leaders to explore the future of artificial intelligence, cognitive systems, and data science.
Register today. Qualified academics receive special conference pricing.
- IEEE Big Data
- Business Higher Education Forum
The Computing Research Association released a bulletin on the emerging field of Data Science on October 7, 2016. Excepts are as follows:
By CRA’s Committee on Data Science: Lise Getoor (Chair), David Culler, Eric de Sturler, David Ebert, Mike Franklin, and H.V. Jagadish on behalf of the CRA Board
Our ability to collect, manipulate, analyze, and act on vast amounts of data is having a profound impact on all aspects of society. This transformation has led to the emergence of data science as a new discipline. The explosive growth of interest in this area has been driven by research in social, natural, and physical sciences with access to data at an unprecedented scale and variety, by industry assembling huge amounts of operational and behavioral information to create new services and sources of revenue, and by government, social services and non-profits leveraging data for social good. This emerging discipline relies on a novel mix of mathematical and statistical modeling, computational thinking and methods, data representation and management, and domain expertise. While computing fields already provide many principles, tools and techniques to support data science applications and use cases, the computer science community also has the opportunity to contribute to the new research needed to further drive the development of the field. In addition, the community has the obligation to engage in developing guidelines for the responsible use of data science.
Data science starts with a strong set of foundations adapted from several fields including statistics, mathematics, social science, natural sciences, and computer science. Already, virtually all aspects of traditional computer science research have played a role in the development of data science. And looking forward, data science will drive fundamentally new computing research.
- From a data management perspective, data science requires a much deeper understanding and representation of how datais acquired, stored and accessed. Data lineage, data quality, quality assurance, data integration, storage, privacy, and security all need to be rethought. The traditional approach of acquisition, followed by storage, and processing often does not work for high rate or sensitive data.
- From a computational point of view, very large data volumes, very high data rates, and very large numbers of users, demand new systems and new algorithms. New system architectures that can accommodate the heterogeneity and irregular structure in data access and communication are needed. From an algorithmic perspective, there is a need for sublinear algorithms, online algorithms that support real-time data streams, and probabilistic and stochastic approaches to accommodate both scale and noise in the data.
- Furthermore, many classic statistical assumptions and machine learning techniques do not fit current data science needs. Often derived from natural sources, data is increasingly likely to be biased, incomplete and highly heterogeneous. Systematic errors arising in automated data collection and semantic inconsistencies that result from stitching data together from multiple sources across longer time horizons present profound modeling challenges and opportunities for the development of new statistical methods and machine learning algorithms. Even in the small data setting, new techniques that can cope with heterogeneity and biased sampling are needed. While predictive modeling is important, many data science problems involve decision making, and the ability to reason about alternate courses of action is needed. In addition, understanding the curse of dimensionality, overfitting, and causality in these complex settings is critical.
- The challenges in scale and heterogeneity also fundamentally change how users interact with data and models, how the data is visualized, what algorithms are needed to support understanding and interpretation of the results of data science models, how decisions are made, and how user feedback is acquired and incorporated. Human computer interaction and visual analytics will need to be more tightly integrated with data science models and algorithms. New use cases for natural language processing, speech, computer vision and other human-machine communication modes will emerge.
- Because data science systems are often embedded in operational systems with changing demands and distributions,supporting the entire data science lifecycle is important. Ensuring the robustness of all aspects of the pipeline is important. New software engineering and computer programming best practices will need to be developed. Additionally, data artifacts will often persist beyond their initially planned usage, so longer-term curation and management must also be addressed.
December 5-7, 2016, Barcelona, Spain
Call for Papers:
Data has always been recognized as an important asset for driving value be it scientific, governmental or for enterprise. The amount of data being generated constantly with high volume, velocity and variety has marked the emergence of “Big Data” as a contemporary research challenge with vast opportunities. In the context of the digital-world, the internet and Social Networks have contributed immensely in generating Big Data pools. Hundreds of millions of people around the globe are nowadays connected using different type of social networks. Big Data and Social Networks become interrelated in the modern computing agenda being both concerned with intensive data. As much opportunities they hold, Big Data and Social Networks come with their own challenges such as management, Security, Processing etc. In order to unlock the potentials of these technologies, innovative solutions are required, which will leverage new models in computing. The 4th International workshop on Big Data and Social Networking Management and Security will be a forum for scientists, researchers, students, and practitioners to present their latest research results, ideas, developments, and applications in the areas of big data and social networking. We are mainly interested in receiving state of the art work on different aspect of big data management system, big data security and privacy, cloud computing and big data, social networking and big data analytics, and social networking management and security, to mention but few.
The topics of interest for this workshop include, but are not limited to:
- Big data and Social Networking concepts and applications
- Emerging technologies in Big Data and Social Networking
- Management Issues of Social Network Big Data
- Security challenges in Big Data and Social networks
- Social Network and Big Data Analytics
- Open Source tools for Big Data
- Green Computing for Big Data
- Network Infrastructure for Social Networking and Big Data
- Social Networks Monitoring Tools As a Service
- Cloud Computing for Big Data and Social Networks
- Big Data and the Internet of Things
- Big Data Management
- Big Data and Decision Making
- Visualization tools for Big Data
- Mobile Cloud networks and Big Data
- Social network data analysis tools and services on the Cloud
- Case studies
- Paper Submission: September 15, 2016
- Author’s Notification: October 15, 20156
- Camera Ready and Registration Deadline: October 30, 2016
Instructions for Authors
Prospective authors are invited to submit full papers of up to 6 pages, strictly following the IEEE Proceedings Templates. Submitted papers will be peer-reviewed and prospective authors are expected to present their papers at the conference. The papers that are accepted and presented at the conference will appear in conference proceedings. At least one author of each accepted submission must attend the workshop.
Please submit your paper in PDF format via the electronic submission system
When: September 21-23, 2016
Where: Statistical and Applied Mathematical Sciences Institute (SAMSI), 19 T.W. Alexander Drive, Research Triangle Park, NC 27709-4006
The workshop aims to bring academic researchers and industrial engineers together for the exploration and scientific discussions on recent challenges faced by practitioners and related theories and proven best practices in both academia and industries on distributed data analytics.
In recent works of computational mathematics and machine learning, great strides have been made in distributed optimization and distributed learning. For example, using ‘consensus’ on local variables and global variable, the Alternating Direction Method of Multipliers (ADMM) algorithm can be utilized to solve a distributed version of the LASSO problem. On the other hand, classical statistical methodology, theory, and computation are based on the assumption that the entire data are available at a central location; this is a significant shortcoming in modern problem solving. It is known that computing speed at a single machine can be thousands time faster than the data transmission between locations.
Specific goals of the workshop include (i) exposing academic researchers to both the challenges in industrial applications and current computing tools being used in industry, (ii) introducing industrial researchers to the frontiers of applied mathematical and statistical methods regarding distributed inference, and (iii) educating graduate students and early-career researchers about practical computing and theoretical studies in distributed analytics. The workshop will begin with few tutorial type lectures followed by lectures and panels on state-of-the-art research based methods by leading researchers and practitioners in this emerging field of mathematics.
The workshop will be limited to about 50 participants and funding support priority will be given to U.S. based researchers.
Questions: email firstname.lastname@example.org
WHEN: WHERE: Keck Center – 500 Fifth St. NW Room 100, Washington, D.C. 20001
The Committee on Applied and Theoretical Statistics invites you to attend a two-day workshop on the challenges of applying scientific inference to big data. The workshop will bring together statisticians, data scientists and domain researchers from different biomedical disciplines to explore four key issues of scientific inference:
- Inference about causal discoveries driven by large observational data
- Inference about discoveries from data on large networks
- Inference about discoveries based on integration of diverse datasets
- Inference when regularization is used to simplify fitting of high-dimensional models
The aim of the workshop is to identify new developments that hold significant promise and to highlight potential research program areas for the future. Please contact Michelle Schwalbe at email@example.com with any questions.
The first edition of Mission-Critical Big Data Analytics workshop (MCBDA 2016) is going to take place on May 16-17, 2016 at Prairie View A&M University, Prairie View, TX, USA. This two-day workshop will cover the state-of-the-art research and development in big data analytics, especially for mission-critical applications. There will be keynote speech by invited renowned speakers, and tutorial sessions with hands-on training. Research works of cutting edge research topics in big data will be presented, plus a poster session and a demo session where students will present their works.
Registration fee is $200, including all the technical sessions, USB proceedings, workshop brochure and bag, two lunches on May 16 and 17, the welcome reception in the evening of May 16, and a tour of the Prairie View A&M University campus. Registration is open: Click here
September 14-16, 2016
Hosted by IBM Emerging Technologies
Research Triangle Park, NC
What is PyData?
PyData conferences bring together users and developers of data analysis tools to share ideas and learn from each other. The PyData community gathers to discuss how best to apply Python tools, as well as tools using R and Julia, to meet evolving challenges in data management, processing, analytics, and visualization. We aim to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person.
The event brings together analysts, scientists, developers, engineers, architects and others from the data science community to discuss new techniques and tools for management, analytics and visualization of data. PyData welcomes presentations focusing on Python as well as other languages used in data science (e.g. R, Julia). Presentation content can be at a novice, intermediate or advanced level.
Talks will run 30-40 minutes each (45 minute slots including time for questions) with 30 available talk slots. Twelve tutorials are planned, six running 90 minutes each, and six running 2 hours each. As a reminder, PyData presentations are intended to share knowledge and experience. We welcome talks letting attendees know how you are using tools in your work, but discourage any proposals with the aim of selling a product.
If you are interested in presenting a talk or tutorial, we encourage your submission(s). To see the type of topics presented at previous PyData events, please look at our past conference sites at pydata.org or check out the videos on YouTube or Vimeo.