dvs
(data versioning system) is a file linker that allows teams to version files under Git without directly tracking them.
This R package allows teams to collaborate without uploading large or sensitive files to Git.
Instead of uploading data files to Git, a user can employ dvs
, which copies the files to a shared storage directory and generates metadata files. The user can upload these metadata files to Git to make the versioned files accessible to collaborators.
dvs
will generate a .gitignore
in the immediate directory of each versioned file excluding the versioned file and including its corresponding metadata file.
When collaborators pull from Git, they can employ dvs
to parse the metadata files to locate each corresponding data file copy in the storage directory
and copy them back to the project directory.
A dvs.yaml
file is generated upon initialization in the project directory from which dvs
parses the storage directory.
A .dvs
metadata file is generated for each versioned file in its given directory.
A versioned file's metadata file contains a hash of the versioned file's contents via the blake3 algorithm.
This hash is used to both track the most current version of the file and create the path for a versioned file's copy in the storage directory.
See a detailed tutorial here.
Step 1: Initialize with dvs_init
to set an accessible storage directory outside the git repo.
dvs_init("/data/dvs/storage_directory")
Step 2: Add files to the storage directory with dvs_add
.
dvs_add("data.csv")
Step 3: Push to Git.
Step 1: Pull from Git.
Step 2: Generate a report with dvs_status
to view versioned files.
dvs_status()
Step 3: Get files from the storage directory with dvs_get
.
dvs_get("data.csv")