-
Notifications
You must be signed in to change notification settings - Fork 2
datamgmt
We see the term "data management" used in several different ways, generally falling under these four perspectives. We have added parenthetical terms to help distinguish the differences.
Statisticians, data analysts and data scientists tend to use "data management" to mean the specific manipulations needed to clean-up and restructure raw data prior to data analysis. They slice, dice, filter, combine, rename, and reshape data as needed for their analysis. Sometimes they refer to this task as data "conditioning", "clean-up", "wrangling", "munging", and "tidying". To a statistician, "managed" data are ready for use in statistical analysis.
Librarians, archivists, and curators often use "data management" to mean preserving and sharing research data in organized, centralized, and searchable systems. They procure, archive, curate, catalog, and publish data. Ask them about metadata and taxonomies to see their eyes light up. They may offer to help you create your data management plan for free. "Managed" data is a data collection which has been preserved and made available as a resource.
Computer scientists, information technologists and database administrators (DBAs) think of designing and managing "databases", traditionally "relational databases" (SQL), and more recently "NoSQL" databases. They model, import, index, and export data, with security, performance, scalability, collaboration, and data integrity in mind. Data which is "managed" is organized and maintained in a specific data system, not simply a collection of files.
Corporate IT managers and CIOs will think of "data management" in terms of building and operating organization-wide data systems, ideally one large, completely integrated system or "data warehouse". Assortments of small databases (and spreadsheets) strewn about the organization are the stuff of nightmares. To them, "managed" data is under centralized control to ensure consistency, security, and availability.
How do you think of data management? As a scientific researcher, we hope you consider all of these perspectives as you conduct your research. Plan to spend time organizing your data according to your requirements, allocate computing and other resources to the task, budget accordingly, and think about your collaborators and how you might make accessing the data easier (and more secure) for the entire team. Most importantly, consider data management early in your project. Design your experiment with data management in mind. What data will be most helpful in answering your research question? How will you collect, store, process, and publish your data most effectively?
The latest version of this document is online at: https://github.com/brianhigh/research-computing/wiki Copyright © The Research Computing Team. This information is provided for educational purposes only. See LICENSE for more information. Creative Commons Attribution 4.0 International Public License.