Research Data Management Cheat Sheet

Research Data Management Lifecycle

The research data management lifecycle is an illustration of the research process as it relates to data management. In this system, data is meant to be organized, annotated and stored in ways that will facilitate data sharing and reuse and/or research validation.

Project Planning

At the start of any research project, you should think ahead about what data you will need to use (if any) during your research processes. You may need to pay to access, compute against, or store data, so knowing about these costs upfront can inform grant budgets.

Data Creation / Collection

Data might be collected from vendors, databases, or other researchers. If the data you are searching for does not exisit, you may need to collect it yourself from multiple sources, or you may need to create the data.

Support at Yale:

Reusing data from data repositories - strategies for finding appropriate data repositories
Reusing data associated with publications strategies for finding research data for reuse
Collecting data from EHRs (Electronic Health Records) - YNHH Epic EHR data pull requests at Yale are places through JDAT (the Joint Data Analytics Team

Data Processing

After data is collected or created, most likely you will need to process or clean the data in some way. Data processing and cleaning can involve, merging multiple datasets, selecting or filtering out specific portions of a dataset, standardizing categories found within a dataset, reorganizing how spreadsheets containing data are organized, and more. When you create data groups that result in aggregation, data processing can start to bleed into data analysis.

Support at Yale:

Data Support @ the Medical Library - email [email protected]

Data Analysis

Data analysis = generating findings from your data.

An important part of data analysis includes data visualization (i.e., graphs).

Support at Yale

Statistical help - StatLab
Bioinformatics help - Bioinformatics Hub
High Performance Computing & Parellel Computing - YCRC (Yale Center for Research Computing
Research & Analytics Clinics - YCAS (Yale Center for Analytical Sciences)
Help designing and creating data visualizations - email [email protected]

Data Sharing / Retention

Generally, research data and materials that are commonly accepted in the scientific community as necessary to validate research findings must be retained by Yale researchers for three (3) years after publication of the findings or all required final reports (e.g., progress and financial) for the project have been submitted to the sponsor. Yale Policy 6001 Research Data & Materials Policy

Data sharing refers to the process of making data public, typically via a data repository. Data retention refers to storing data so it remains usable, though not nessissarily available to the public.

Research Data Management Lifecycle + Constant Themes

In addition to the (sometimes iterative) stages you will progress through during the Research Data, there are also themes that you will need to consider during multiple, if not all, of these phases.

Version Control

Version control allows you to see the change history of a file, and to restore a file to a previous iteration. You can apply a manual version control by adding dates or v1/v2/vfinal notations to a file name, or by writing a change log within a READme file. Cloud data storage systems like Box and Google Drive have version control capabilitites (Note: whenever you are exploring a data, or content management system, make a note to check if the system supports version control and how versions are retained).

The most robust and independent way to maintain control over your file versioning is to apply a Version Control System like Git.

Support at Yale:

If you have questions about Git, or would like help getting started with Git or GitHub, email [email protected].

Documentation

Documentation can include any notes and annotations related to your research data that make your data understandable to others (as well as your future self). Maintaining accurate and useful documentation can make the difference between your data being reusable in future research senarios or not.

Support at Yale:

Take a look at this additional information about Codebooks, Data Dictionaries & ReadMe Files and email [email protected] with any questions.

Data Storage

When choosing a data storage solution, you should think about how often you will be using this data, if others will need to access this data too, how much a data storage solution will cost, the level of risk associated with your data, the size of your data, and more.

Support at Yale:

This interactive storage finder tool can help you navigate the various options available to you through Yale.

Data Security

How can you know which software are cleared for moderate or high risk data? (And what are the classifications of moderate or high risk data?) Check with Yale Information Security

Reserach Data Management Lifecycle + Technology

When you start to think about how you would actually engage with any of these steps, different technology aspects come into play, along with themes including version control, documentation, and operational data storage.

Project Planning Tools

DMPTool - access and store templated data management plans. Quick start guide
To fill out a practise form that contains research data planning considerations, visit this Google Form

Data Creation Tools

Core Research Facilities - Yale’s Core Research Facilities provide Yale researchers access to state of the art scientific instrumentation with the intent to keep Yale’s scientific research at the cutting edge. Each Core employs highly trained staff that may provide training and assistance with use of instrumentation as well as aid in experimental design.

Data Collection Tools

APIs (Application Programming Interfaces)
Qualtrics - create and deploy
Microsoft Excel
Databases - databases are more robust for storing and organizing interrelated data structures than spreadsheets or tables. Email [email protected] with questions about relational database design and set-up.

Data Processing Tools

Microsoft Excel - Excel funtions | Data processing/analysis in Excel
Python - email [email protected] for a workshop or tutorial based on your research needs
R - email [email protected] for a workshop or tutorial based on your research needs
OpenRefine - A powerful tool for working with messy data, cleaning it, and transforming it from one format or structure to another

Data Analysis Tools

There are many proprietary analysis tools; this document will focus on what you have access to free and/or through Yale
Find software through the Yale Software IT Library
Microsoft Excel - Excel funtions | Data processing/analysis in Excel
Python - email [email protected] for a workshop or tutorial based on your research needs
R - email [email protected] for a workshop or tutorial based on your research needs
Learn about different types and categories of graphs via the Data Viz Catalogue
Jump start your ability to create data visualizatins in R with The R Graph Gallery

Data Sharing / Retention Tools

Interactive storage finder tool
Dryad - deposit research data for free through Yale. More about Dryad
Zenodo - deposit research code and data. More about Zenodo

FAQs

Have other questions? Email [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
best-practices		best-practices
docs		docs
images		images
README.md		README.md
find-training.md		find-training.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Data Management Cheat Sheet

Research Data Management Lifecycle

Project Planning

Data Creation / Collection

Data Processing

Data Analysis

Data Sharing / Retention

Research Data Management Lifecycle + Constant Themes

Version Control

Documentation

Data Storage

Data Security

Reserach Data Management Lifecycle + Technology

Project Planning Tools

Data Creation Tools

Data Collection Tools

Data Processing Tools

Data Analysis Tools

Data Sharing / Retention Tools

FAQs

About

Releases

Packages

CWML/research-data-management-cheat-sheet

Folders and files

Latest commit

History

Repository files navigation

Research Data Management Cheat Sheet

Research Data Management Lifecycle

Project Planning

Data Creation / Collection

Data Processing

Data Analysis

Data Sharing / Retention

Research Data Management Lifecycle + Constant Themes

Version Control

Documentation

Data Storage

Data Security

Reserach Data Management Lifecycle + Technology

Project Planning Tools

Data Creation Tools

Data Collection Tools

Data Processing Tools

Data Analysis Tools

Data Sharing / Retention Tools

FAQs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages