Skip to content

Standardised citation information for biological and environmental datasets

License

Notifications You must be signed in to change notification settings

kbase/credit_engine

Repository files navigation

PR Workflow Codecov Codacy Badge

Dataset Credit Engine

This repo holds the schema and associated scripts used by the Dataset Credit Engine.

The Dataset Credit Engine is a project aimed at ensuring that appropriate citation information exists for data entering and/or produced by biological and environmental research platforms to allow credit to be attributed to those who produced the data.

Metadata Schema

The dataset credit metadata schema is maintained in LinkML format; other formats (including the python class) can be generated from the LinkML schema file.

See the LinkML documentation for full details on using the LinkML format and the related tools.

Full schema documentation can be found at https://kbase.github.io/credit_engine/.

Schema Diagram

Generated from the Pydantic version of the Dataset Credit Metadata Schema using erdantic.

dataset credit metadata schema diagram

See below for how to regenerate the ER diagram after making changes to the schema.

Software Installation

This repo uses uv to manage the python environment and dependencies.

See the uv docs for uv installation instructions.

Install the project dependencies and create a virtual environment:

uv sync

Run tests or other scripts:

uv run <command>
uv run pytest tests/

Useful commands

These assume that you have already run uv sync to install the credit engine virtual environment and dependencies.

generate derived files in all formats and save them to the project directory:

uv run gen-project -d project/ schema/dcm/linkml/credit_metadata.yaml

lint the LinkML schema file:

uv run linkml-lint -f terminal schema/dcm/linkml/credit_metadata.yaml

validate data (in file data.yaml) against the schema:

uv run linkml-validate -s schema/dcm/linkml/credit_metadata.yaml data.yaml

generate JSON Schema version:

uv run gen-json-schema schema/dcm/linkml/credit_metadata.yaml > schema/dcm/jsonschema/credit_metadata.schema.json

generate Python classes:

uv run gen-python schema/dcm/linkml/credit_metadata.yaml > schema/dcm/python/credit_metadata.py

generate Pydantic classes:

uv run gen-pydantic schema/dcm/linkml/credit_metadata.yaml > schema/dcm/python/credit_metadata_pydantic.py

generate an ER diagram from the Pydantic classes using erdantic (assumes that erdantic has been installed already):

uv run erdantic schema.dcm.python.credit_metadata_pydantic.CreditMetadata -o schema/dcm/dcm-schema.png

generate a YUML schema diagram (can be visualised at yuml.me):

uv run gen-yuml schema/dcm/linkml/credit_metadata.yaml

JSON Schema data validation

install the JSONschema check script:

# install with Homebrew
brew install check-jsonschema

or

# install with pip
pip install check-jsonschema

To test a file or files against the schema, use the command:

check-jsonschema --schemafile schema/dcm/jsonschema/credit_metadata.schema.json data_file_1.json data_file_2.json

or

check-jsonschema --schemafile schema/dcm/jsonschema/credit_metadata.schema.json sample_data/**/*_dcm.json

About

Standardised citation information for biological and environmental datasets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •