Skip to main content

Getting New Insights From Old Data

Dr. Alper Bozkurt does his research on Centennial Campus in EBII. Photo by Marc Hall

“It was the best of times, it was the worst of times,”… Although referring to the dynamic French Revolution, Charles Dickens could have just as easily been referring to our current landscape for secondary data analysis. On the one hand, advances in machine learning, artificial intelligence, quantum computing, and cloud storage makes this a golden era for data science. On the other hand, the SARS-CoV-2 pandemic has made this the worst of times for new data collection – with lab closures, protocols on hold, and travel freezes providing new incentives for researchers to dust-off existing data sets for new analysis. So, for those of you who are thinking about taking a new look at old data, the PDU offers some advice on pursuing research support.

Make your data FAIR. While many funding agencies are taking a new look at funding secondary data analysis, they want to see that the re-investment in the data will be broadly useful. You may see funding opportunities citing FAIR principles – data should be Findable, Accessible, Interoperable, and Reusable. The NC State Libraries have a number of tools and resources to help you consider applying FAIR properties to your data sets. If you have a data set that you think could be broadly useful with a just a small investment for curation, consider pursuing a research resource grant mechanism, such as the NIH’s U24 Data Repository program, rather than a traditional research grant mechanism. 

Don’t shortchange data management in your research grants. Often, when thinking about a new grant application, PIs will budget for the materials, supplies, and effort needed to collect data, but miss out on requesting funds for data curation and management later on in the project. Several data repositories, such as NIMH’s NDAR and USDA’s eOrganics portal have created tools and resources for thinking about how much to budget for data management and data sharing. Taking advantage of data curation resources early on may make a big difference on the value of the data for later re-use.

Consider using someone else’s data. Federal agencies have been incentivizing several of their large dataset owners to be more pro-active about sharing data access. If you have a big idea, but not the current dataset to back it up, take a look around for other resources that may be available. For example, as part of the NC TraCS consortium, NC State faculty can request to work with the Carolina Data Warehouse to get limited access to UNC system electronic health records data. If you do decide to use an external data set, be sure to build in plenty of proposal development time to get to know the data – request a data dictionary if possible to ensure the variables you need are part of the set. Also, recognize that most funding agencies will want to see a Letter of Support or some other evidence that the data owner will be able to share the data with you. You may also need to budget funds for the time and effort needed to retrieve and format data sets for sharing.

As we adapt to these interesting times, pursuing research opportunities in secondary data analysis may let you test new hypothesis and make new collaborations, all while maintaining your social distance.