GitHub

This is the git repository for the DURIN project. The goal of this repo is to organize and streamline the data management in the project and beyond.

DATA MANAGEMENT

Location of data, metadata and code

The overview of all the datasets is here.

The raw and clean datasets from are stored and will be made available after the end of the project on OSF. For now the data is only available to the project partners.

All R code for the cleaning the raw data is available on the Durin GitHub.

The data documentation, including a draft for the data paper is available here (only available for authors), and the data dictionaries are in this below in this readme file.

Naming conventions for the datasets and files

We use snake_case style for names and coding. Snake_case means that we use lower case letters separated by an underscore, for example biomass_g or age_class. The one exceptionare siteID, blockID, plotID (legacy from previous projects).

File and variable names should be meaningful. Do not use my_data.csv or var1. Follow the naming convention for the main variables (see full description of the main variables in the table below).

We use redundancy in variable names to avoid mistakes. PlotID should contain the higher hierarchical levels such as siteID, blockID, treatment, habitat, species etc. For example plotID contains siteID, habitat, species and plot number: LY_O_VV_1, LY_F_VM_3

Files or variable	Naming convention	Example
File name	Project_Status_Experiment_Response_Year(s).Extension	DURIN_clean_gradient_cflux_2023-2025.csv
	Project	DURIN
	Experiment	4 corner = 4 main Durin sites, drought = drought experiment at Lygra and Tjotta, gradient = gradient study, nutrient = nutrient experiment

4 corner study
date	Date of data collection	yyyy-mm-dd; do not split year, month and day into several columns
year	Year of data collection	yyyy; sometimes there is no specific date, then year can be used
site_ID_name	Unique full site name	Lygra, Sogndal, Senja, Kautokeino and Tjotta
siteID	Unique siteID, first 2 letters of site_ID_name.	LY, SO, SE, and KA, TJ
habitat	Open versus forested habitat	O, F
species	Vascular plant taxon names follow for Norway Lid & Lid (Lid J & Lid, 2010). We use full species names. For field sheets the names can be abbreviated (see speciesID), but the clean data should contain the full species name	Vaccinium myrtillus
speciesID	2 letter abbreviation of species	VM, VV, CV, EN, BN
plot_nr	Unique plot number, numeric value from 1-5.	1-5
plotID	Unique plot ID as a combination of siteID, habitat, speciesID and plot number	LY_O_VV_1, LY_F_VM_1
variable	Response variable(s)	cover, biomass, Reco
value	Value of response variable(s)	numeric value
unit	Unit for response variable(s)	%, µmol m−2 s−1
other variables	Other important variables in the dataset	plant nr, remark, data collector, weather, flag

DroughtNet experiment
site_ID_name	Site name	Lygra, Tjotta
siteID	Unique siteID, first 2 letters of site_ID_name.	LY, TJ
age_class	Vegetation representing post-fire successional stages.	pioneer, building, mature
drought_treatment	Drought treatment using rain-out shelters that reduce precipitation by 0 = ambient, 60 = moderate, or 90% = extreme	ambient, moderate, extreme
plotID	Unique plot ID as a combination of siteID, age_class, drought_treatment and plot number	LY_P_PIO_1, LY_M_EXT_3
droughtNet_plot	Unique plotID from the DroughtNet frames in the field. Correspond with Landpress naming.	1.1,1.2,1.3 - 9.1,9.2,9.3
species	Vascular plant taxon names follow for Norway Lid & Lid (Lid J & Lid, 2010). We use full species names. For field sheets the names can be abbreviated (see speciesID), but the clean data should contain the full species name	Vaccinium myrtillus
speciesID	2 letter abbreviation of species	VM, VV, CV, EN

Nutrient experiment
date	Date of data collection	yyyy-mm-dd; do not split year, month and day into several columns
site_ID_name	Site name	Lygra
siteID	Unique siteID, first 2 letters of site_ID_name.	LY
habitat	Open versus forested habitat	O, F
age_class	Vegetation representing post-fire successional stages.	pioneer, building, mature
nitrogen_addition	Added level of nitrogen	5, 10, 25, 30
plot_nr	Unique plot number	1-5
plotID	Combination of siteID, habitat, age_class, nitrogen addition and plot number.	LY_O_BUI_10_3

Gradient study
date	??	?
siteID	??	?
habitat	Open versus forested habitat	O, F
plot_nr	Unique plot number	1-5
plotID	Combination of siteID, habitat, and plot number.	??

Organize data sets

Each dataset should contain only one response variable, or several if they are closely related. E.g. biomass and carbon flux should be two separate datasets. But the functional trait dataset can contain several traits, and the carbon flux dataset can contain GPP, NEE and Reco.

Datasets should be in a long format. If you have several response variables (e.g. traits, flux measurements), then use pivot_long(cols = ..., names_to = "variable", values_to = "value") to conver the data to a long format. Using the names variable and value for the columns is a standard. If you have very different response variables, it can be useful to have a column called unit.

When dealing with many different datasets it can be useful to structure them in a similar way. Arrange the dataset so that the important variables come first, preferable use this order:

Year and date
siteID
plotID
habitat
treatment
plantID
leafID
variable (diversity, biomass, flux)
value
(unit)
predictor_variables (temperature level, oceanity)
other_variables (remark, data collector, weather)

Data dictionary

How to make a data dictionary?

The R package dataDocumentation that will help you to make the data dictionary. You can install and load the package as follows:

# if needed install the remotes package
install.packages("remotes")

# then install the dataDocumentation package
remotes::install_github("audhalbritter/dataDocumentation")

# and load it
library(dataDocumentation)

Make data description table

Find the file R/data_dic/data_description.xlsx. Enter all the variables into that table, including variable name, description, unit/treatment level and how measured. If the variables are global for all of Funder, leave TableID blank (e.g. siteID). If the variable is unique for a specific dataset, create a TableID and use it consistently for one specific dataset. Make sure you have described all variables.

Make data dictionary

Then run the function make_data_dic().

data_dic <- make_data_dictionary(data = biomass,
                                 description_table = description_table,
                                 table_ID = "biomass",
                                 keep_table_ID = FALSE)

Check that the function produces the correct data dictionary.

Add data dictionary to readme file

Finally, add the data dictionary below to be displayed in this readme file. Add a title, and a code chunk using kable() to display the data dictionary.

For more details go to the dataDocumentation readme file.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
R		R
pics		pics
.gitignore		.gitignore
Durin_data.Rproj		Durin_data.Rproj
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DATA MANAGEMENT

Location of data, metadata and code

Naming conventions for the datasets and files

Organize data sets

Data dictionary

About

Releases

Packages

Languages

hrdawson/Durin_data_project

Folders and files

Latest commit

History

Repository files navigation

DATA MANAGEMENT

Location of data, metadata and code

Naming conventions for the datasets and files

Organize data sets

Data dictionary

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages