Skip to content
brianhigh edited this page Nov 25, 2015 · 8 revisions

Data Management

Four Perspectives

We see the term "data management" used in several different ways, generally falling under these four perspectives. We have added parenthetical terms to help distinguish the differences.

"(Preparatory) Data Management"

Statisticians, data analysts and data scientists tend to use "data management" to mean the specific manipulations needed to clean-up and restructure raw data prior to data analysis.[1] They slice, dice, filter, combine, rename, and reshape data as needed for their analysis. Sometimes they refer to this task as data "conditioning", "clean-up", "wrangling", "munging", and "tidying". To a statistician, "managed" data are ready for use in statistical analysis.[2]

"(Research) Data Management"

Librarians, archivists, and curators often use "data management" to mean preserving and sharing research data in organized, centralized, and searchable systems. They procure, archive, curate, catalog, and publish data.[3] Ask them about metadata and taxonomies to see their eyes light up. They may offer to help you create your data management plan for free. "Managed" data is a data collection which has been preserved and made available as a resource.[4]

"Data(base) Management"

Computer scientists, information technologists and database administrators (DBAs) think of designing and managing "databases", traditionally "relational database management systems" (RDBMS), and more recently "NoSQL" databases. They model, import, index, and export data, with security, performance, scalability, collaboration, and data integrity in mind. Data which is "managed" is organized and maintained in a properly designed and administered data system, not simply a loose collection of files.[5]

"(Enterprise) Data Management"

Professional data managers, corporate IT managers and CIOs will think of "data management" in terms of policies, procedures, and architectures necessary to properly manage the entire data lifecycle of an organization.[6] This includes the building and operation of enterprise data systems, consisting of operational systems specifically designed and optimized for routine transactions and one large, centralized, integrated system or "data warehouse" optimized for long-term storage and analysis.[7] To them, data worth collecting is worth "managing" to ensure consistency, security, and usability.

How do you think of data management? As a scientific researcher, we hope you consider all of these perspectives as you conduct your research. Plan to spend time organizing your data according to your requirements, allocate computing and other resources to the task, budget accordingly, and think about your collaborators and how you might make accessing the data easier (and more secure) for the entire team. Most importantly, consider data management early in your project. Design your experiment with data management in mind.[8] What data will be most helpful in answering your research question? How will you collect, store, process, and publish your data most effectively?


1. Horton, N., & Kleinman, Ken. (2015). Data Management. Using R and RStudio for data management, statistical analysis, and graphics, Second ed. (pp. 11-31).
2. Baumer, B. (2015). A Data Science Course for Undergraduates: Thinking with Data. The American Statistician, 00. doi: 10.1080/00031305.2015.1081105.
3. Surkis A, Read K. Research data management. Journal of the Medical Library Association : JMLA. 2015;103(3):154-156. doi:10.3163/1536-5050.103.3.011.
4. Pollock, Ludmila. Data management: Librarians or science informationists? Nature. 2012;490,343. doi:10.1038/490343d. Published online 17 October 2012.
5. Database, Wikipedia, CC BY-SA 3.0
6. Data management, Wikipedia, CC BY-SA 3.0
7. Data warehouse, Wikipedia, CC BY-SA 3.0
8. Smith, P., Morrow, R., & Ross, D. (2015). Data Management. Field Trials of Health Interventions, 3rd ed. (pp. 338-364). Oxford University Press. Retrieved Nov. 25, 2015, from http://www.oapen.org/search?identifier=569923.