RED Talks Video Archive


Date: 6pm, October 29, 2018
Location: 2203 SAS Hall,
Tensors are multiway arrays, and these occur naturally in many data analysis. Consider a series of experiments tracking multiple sensors over time, resulting in a three-way tensor of the form experiment-by-sensor-by-time. Tensor decompositions are powerful tools for data analysis that can be used for data interpretation, dimensionality reduction, outlier detection, and estimation of missing data. In this talk, we consider the mathematical, algorithmic, and computational challenges of tensor methods and highlight their wide ranging utility with examples in neuroscience, chemical detection, social network analysis, and more. We discuss several new developments, including a new “generalized” version of tensor decomposition that allows for alternative statistically-motivated fitting functions.
Click here to view the recorded talk. (Requires Login)

Photo of Guillermo Sapiro

It’s Your Data and Your Decision: Learning Representations to Keep Your Privacy


Date: 5pm, October 15, 2018

Location: Duke Energy Hall C/D, Hunt Library, Centennial Campus, NCSU.


It is becoming increasingly clear that users should own and control their data. Utility providers are also becoming more interested in guaranteeing data privacy. As such, users and utility providers should collaborate in data privacy, a paradigm that has not yet been developed in the privacy research community. We introduce this concept and present explicit architectures where the user controls what characteristics of the data she/he wants to share and what she/he wants to keep private. This is achieved by collaborative learning a sensitization function, either a deterministic or a stochastic one, that retains valuable information for the utility tasks but it also eliminates necessary information for the privacy ones. As illustration examples, we implement them using a plug-and-play approach, where no algorithm is changed at the system provider end, and an adversarial approach, where minor re-training of the privacy inferring engine is allowed. In both cases the learned sanitization function keeps the data in the original domain, thereby allowing the system to use the same algorithms it was using before for both original and privatized data. We show how we can maintain utility while fully protecting private information if the user chooses to do so, even when the first is harder than the second, as in the case here illustrated of identity detection while hiding gender. We also present examples showing how secure devices can be designed to
protect the privacy of ordinary people around the device. This talk is based on joint work with M. Bertran, N. Martinez, A. Papadaki, Q. Qiu, M. Rodrigues.


Photo of C. Titus Brown

The Secret Life of Microbial Genomes

(click above to see taped video of talk)

C. Titus Brown, UC Davis

Date: 6pm, September 11, 2018

Location: 2203 SAS Hall


Recent advances in large scale sequencing of microbial DNA without culturing or isolation gives us easy and direct access to “wild” microbial metagenomes that are otherwise virtually impossible to study. Our recent work has focused on studying the ecology and genomics of microbial genomes and “population” pan-genomes in environmental samples, using tools and approaches developed in our lab and in collaboration with others. We see many ways in which microbial genomes do not particularly resemble our naive expectations, which is leading us into some productive confusion around known knowns, known unknowns, and unknown unknowns in environmental microbial systems.Our work relies heavily on novel methods and technical infrastructure development. In this talk I will present a series of vignettes on our techniques and some of the results, aimed at a general scientific audience.

Click here to see recorded talk


Dr. Ingrid Daubechies 

Photo of Ingrid Daubechies

When: 10 October, 2017 in Talley Student Union

Title: Mathematicians Helping Art Historians and Art Conservators

Abstract: In recent years, mathematical algorithms have helped art historians and art conservators putting together the thousands of fragments into which an unfortunate WWII bombing destroyed world famous frescoes by Mantegna, decide that certain paintings by masters were “roll mates” (their canvases were cut from the same bolt), virtually remove artifacts in preparation for a restoration campaign, and even get more insight into paintings hidden underneath a visible one. The presentation will review these applications, and give a glimpse into the mathematical aspects that make this possible.

Photo of Elliot Inman

Dr. Elliot Inman

“Quantification: The Art of Making Data”
Sept 28th, 7:00 p.m.
Mountains Ballroom, Talley Student Union

In this Data Science Initiative talk and the DH Hill Makerspace workshops that follow, we will explore the art of making data – from the mechanics of setting up sensors and digitizing information to generating a data structure that will be useful for making sense of that information. We will experiment with a wide variety of data from recorded sound to human touch and even human emotions, building devices to help gather reliable, valid measures.

Read More
Long before there was such a thing as “Data Science” or a “Makerspace,” the engineer Herman Hollerith created an electromechanical tabulation machine, a device that allowed for the statistical summary of data via the use of paper cards with holes “punched” out to represent data. Those “punched card” systems, first built more than 100 years ago, enabled rapid, accurate tabulation of the US Census. They were extremely powerful, but extremely expensive.

These days, homebrew hobbyists, hackers, and makers have access to inexpensive microcontrollers and other electronic components necessary to build devices that can actually look and listen and digitize the output of silicon senses. For less than $50 in parts, inventors can build a device that will instantly publish a stream of binary artifacts to a global audience. But there is a significant difference between a flash-flood of timestamped binary records and a dataset with measures that will help to understand what is happening.

In this talk and the accompanying workshops, we will discuss how the technical specifications of the device/circuit affects how we collect information and, conversely, how our operational definition of the measures of interest should guide decisions about how to design those devices/circuits to gather useful data. We will discuss:

  • How quickly and how often do we need to digitize a signal to capture what we want to measure?
  • How should we structure output data differently if the output is to be read by a person or a machine?
  • How much (or how little) do we need to save to a digital record to make sense of the data?
  • What kinds of additional data will we want to match to those records and how can we structure our output to allow us to do so?

This RED talk will coincide with the following three workshops in Makerspace at DH Hill Library:

  1. The Art of Making Data:  Quantifying Touch – Sept 29th, 1:00 – 3:00
    • We will use an Arduino and sensors to gather data on simple human hand gestures: pressing a button, turning a dial, and waving a hand in front of an electronic eye. We will setup the Arduino to save data in a manner that allows us to use the digitized records for statistical analysis.
  2. The Art of Making Data:  Quantifying Sound – Sept 29th, 3:30 – 5:30
    • We will build a simple contact microphone and record sounds using Audacity. We will set up Audacity to save sound data so that the digitized records can be used for statistical analysis.
  3. The Art of Making Data:  Quantifying Attitudes and Emotions – Sept 30th, 10:00 – 12:00
    • We will build an audience response meter using an Arduino to capture audience emotional responses to a video. We will match those data to the content of the video so that we can conduct a statistical analysis of the resulting data.

Note: For all workshops, all materials and software, including a no-cost online SAS Studio account for statistical analysis, will be provided to participating NCSU students. Please see the DH Hill Makerspace website to register for a particular session.

About Dr. Inman:  Elliot Inman, Ph.D., is a Manager of Software Development for SAS® Solutions OnDemand. Over the past 25 years, he has analyzed a wide variety of data in areas as diverse as the effectiveness of print and digital advertising, social service outcomes analysis, healthcare claims analysis, employee safety, educational achievement, clinical trial monitoring, sales forecasting, risk-scoring and fraud analytics, general survey analysis, performance benchmarking, product pricing, text processing, and basic scientific research on human memory and cognitive processes. After completing his undergraduate degree at North Carolina State University, he went on to earn a Ph.D. in Experimental Psychology from the University of Kentucky in 1997.

Reception to immediately follow in the Piedmont Ballroom.

Photo of Laura Haas

Dr. Laura Haas

“The Power Behind The Throne: Information Integration in the Age of Data-Driven Discovery”
Oct 18th, 7:00 p.m.
Duke Energy Hall, Hunt Library


Abstract: Integrating data has always been a challenge. The information management community has made great progress in tackling this challenge, both on the theory and the practice.  But in the last ten years, the world has changed dramatically.  New platforms, devices and applications have made huge volumes of heterogeneous data available at speeds never contemplated before, while the quality of the available data has if anything degraded.  

Read More
Unstructured and semi-structured formats and no-sql data stores undercut the old reliable tools of schema, forcing applications to deal with data at the instance level.  Deep expertise in the data and domain, in the tools and systems for integration and analysis, in mathematics, computer science, and business are needed to discover insights from data, but rarely are all of these skills found in a single individual or even team. Meanwhile, the availability of all these data has raised expectations for rapid breakthroughs in many sciences, for quick solutions to business problems, and for ever more sophisticated applications that combine and analyze information to solve our daily needs.  In the Accelerated Discovery Lab, we support data scientists working with a broad range of data as they try to find the insights to solve problems of business or societal importance and I will describe the environment we are creating, the advances in the field that enable it, and the challenges that remain.

About Dr. Haas:   Laura Haas is an IBM Fellow and Director of IBM Research’s Accelerated Discovery Lab, which is creating a plug-and-play environment to facilitate deriving insight from data. The environment will meet dual goals: (1) to enable research in and improvements to the tools and systems that facilitate discovery, and (2) to enable the business person or domain expert who uses the environment to focus on their investigations, alleviating the systems and data challenges to speed discovery.

She was the director of computer science at IBM Almaden Research Center from 2005-2011, and had worldwide responsibility for IBM Research’s exploratory science program from 2009 through 2013.  Previously, she was senior manager for Information Integration Solutions Architecture and Development in IBM’s Software Group.  Dr. Haas was one of the founders of Information Integration Solutions, which today includes InfoSphere Information Server and a suite of products that can integrate both structured and unstructured data via federation and materialization.  Dr. Haas managed the development team in its first two years, moving to lead the architecture team for the product suite with the acquisition of Ascential Software in 2005.  Before joining Information Integration, she managed the DB2 UDB Query Compiler development, including key technologies for information integration and the Life Sciences industry, such as federated database and XML Query.

Dr. Haas spent the first twenty years of her career as a research staff member and manager at IBM’s Almaden Research Center in San Jose.  Dr. Haas joined IBM in 1981 as a research staff member on the R* distributed relational database management project.  She worked on the design and implementation of many aspects of the R* system, including catalog management, query processing, and distributed execution protocols.  She was co-manager of the Starburst extensible database system project from its inception through November 1989.   Technology from this project forms the basis of the DB2 UDB query processor.  Dr. Haas then headed the Exploratory Database Systems Department at IBM Almaden for three and a half years.  After a sabbatical year at the University of Wisconsin, Madison, she returned to start a new project at IBM on heterogeneous middleware systems (Garlic). Garlic technology married with DB2 UDB query processing is the basis for InfoSphere Information Server’s federation capabilities, and for the earlier DiscoveryLink offering for Life Sciences R&D.  Dr. Haas served as “acting CTO” for the DiscoveryLink offering in its initial year, while managing an exploratory research project on schema mapping (Clio).  She joined IBM’s Software Group in March 2001.  Before joining IBM, Dr. Haas studied Applied Mathematics and Computer Science at Harvard University, and Computer Science at the University of Texas at Austin, where she received her PhD in 1981.  Her dissertation explored the problem of deadlocks in distributed systems.

Dr. Haas was vice-chair of the Association of Computing Machinery (ACM) Special Interest Group on the Management of Data (SIGMOD) from 1989-1997.  She has served as an Associate Editor of the ACM journal Transactions on Database Systems, as Program Chair of the 1998 ACM SIGMOD technical conference and the IIS track of the 2005 VLDB conference, as general co-chair of VLDB 2008, and as Vice President of the VLDB Endowment Board of Trustees (from 2004-2009). She received an IBM Corporate Award for her work on federated database technology, IBM awards for Outstanding Technical Achievement for her work on R* and on DiscoveryLink, an IBM Outstanding Contribution award for Starburst, an Outstanding Innovation Award for Clio, a YWCA Tribute to Women in INdustry (TWIN) award, the Anita Borg Institute’s Technical Leadership Award, and the ACM SIGMOD Codd Innovation Award.  Dr. Haas is an ACM Fellow and a Fellow of the American Academy of Arts and Sciences, a member of the National Academy of Engineering, on the NRC’s Computer Science and Telecommunications Board (CSTB) and past Vice-Chair of the board of the Computing Research Association.

Reception to immediately follow in Duke Energy Hall.

Photo of Jeff Leek

Dr. Jeff Leek

“Is Most Published Research Really False?”
Nov 2nd, 7:00 p.m.
Mountains Ballroom, Talley Student Union

Abstract: The accuracy of published research is critical for scientists, physicians and patients who rely on these results. But the fundamental belief in the scientific literature was called into serious question by a paper suggesting most published medical research is false. This claim has launched an entire discipline focused on the crisis of reproducibility and replicability of science. In this talk I will discuss two major open problems inspired by this scientific crisis: how do we know when a study replicates and what is the rate of false discoveries in the scientific literature? In answering these questions I will argue that much of the crisis in science can be attributed to misunderstanding statistics.

Read More
About Dr. Leek:   Jeff Leek is an Assistant Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health and co-editor of the Simply Statistics Blog. He received his Ph.D. in Biostatistics from the University of Washington and is recognized for his contributions to genomic data analysis and statistical methods for personalized medicine. His data analyses have helped us understand the molecular mechanisms behind brain development, stem cell self-renewal, and the immune response to major blunt force trauma. His work has appeared in the top scientific and medical journals Nature, Proceedings of the National Academy of Sciences, Genome Biology, and PLoS Medicine. He created Data Analysis as a component of the year-long statistical methods core sequence for Biostatistics students at Johns Hopkins. The course has won a teaching excellence award, voted on by the students at Johns Hopkins, every year Dr. Leek has taught the course.


Photo of Deen Freelon

Dr. Deen Freelon

“Toward a Framework for Inferring Individual-Level Characteristics from Digital Trace Data”
Mar 15, 5:30 p.m.
Mountains Ballroom, Talley Student Union

Digital traces—records of online activity automatically recorded by the servers that undergird all online activity—allow us to explore age-old communication research questions in unprecedented ways. But one of the greatest challenges in doing so is managing the gap between the research’s conceptual focus and the set of readily available traces. Not every type of trace will be equally valuable from a particular research standpoint, and not every interesting concept will be measurable using the traces to which researchers have access.

Read More
The purpose of this presentation is to contribute to the development of a framework for assessing the construct validity of conceptual inferences drawn from digital traces. In it, I will define four platform-independent domains researchers should bear in mind when choosing traces for analysis: technical design, terms of service (TOS), social context, and the potential for misrepresentation. I will illustrate the value of this framework in discussions of three individual-level characteristics of broad interest to communication researchers and others: gender, race/ethnicity, and geographic location.

About Dr. Freelon: Deen Freelon is an associate professor in the School of Communication at American University in Washington DC. He has two major areas of expertise: 1) political expression through digital media, and 2) the use of code and computational methods to extract, preprocess, and analyze very large digital datasets. Freelon has authored or co-authored over 30 journal articles, book chapters, and public reports, in addition to co-editing one scholarly book. He has served as co-principal investigator on grants from the Spencer Foundation and the US Institute of Peace thus far. He is the creator of ReCal, an online intercoder reliability application that has been used by thousands of researchers around the world; and TSM, a network analysis module for the Python programming language.