A collection of JSON schemas defining data models for digital archiving systems based on the OAIS (Open Archival Information System) reference model. These schemas provide standardized definitions for various components of a data archive.
The project defines JSON schemas for key entities in a digital archiving workflow:
- Producer: Entity that provides data to be archived
- Deposit: Information about a submission to the archive
- SIP (Submission Information Package): Package of information submitted to the archive
- IntellectualEntity: Conceptual object being preserved
- Representation: Digital manifestation of an intellectual entity
- File: Individual digital file within a representation
- Fixity: Integrity information for digital files
All schemas are currently at version 0.1.0, indicating this is an early-stage project.
This project is in alpha and actively evolving.
- Stability: Bugs, breaking changes, or incomplete features may occur.
- Evolution: APIs and behaviors can change as we refine functionality.
Provided as‑is, with no guarantees on stability.
We appreciate your patience and feedback!
- Java 21
- Maven
- Python 3.12
- Nix with flakes enabled (recommended)
- direnv for environment management (recommended)
We recommend using the fully automatic setup method using Nix Flakes and Direnv:
- Nix package manager with flakes enabled
- direnv for environment management
- Clone the repository
- Allow direnv in the project directory:
direnv allow
This will automatically:
- Create a Python 3.12 virtual environment in .venv
- Install all dependencies using UV package manager
- Set up the development environment
If you'd like to activate the environment manually without direnv:
nix develop
Python models are automatically generated from JSON schemas:
# Generate Python models from JSON schemas
datamodel-codegen \
--input-file-type jsonschema \
--input schemas/data-archive/ \
--output src/data_archive/ \
--output-model-type pydantic_v2.BaseModel \
--field-constraints \
--use-schema-description
The generated models use Pydantic v2 and are stored in the src/data_archive/
directory.
Java classes are automatically generated from JSON schemas using the jsonschema2pojo Maven plugin:
<!-- Plugin configuration in pom.xml -->
<plugin>
<groupId>org.jsonschema2pojo</groupId>
<artifactId>jsonschema2pojo-maven-plugin</artifactId>
<version>1.2.1</version>
<configuration>
<sourceDirectory>${project.basedir}/schemas/data-archive</sourceDirectory>
<outputDirectory>${project.build.directory}/generated-sources/</outputDirectory>
<targetPackage>ch.ethz.library.darc.model</targetPackage>
<excludes>
<exclude>_shared/**</exclude>
<exclude>catalog.json</exclude>
</excludes>
</configuration>
</plugin>
The plugin is executed during the Maven prepare-package
phase and generates Java classes from the JSON schemas in the schemas/data-archive/
directory. The generated classes are stored in the target/generated-sources/
directory under the package ch.ethz.library.darc.model
.
To generate the Java classes, you can use one of the Maven commands listed in the Java Build Options section.
The project uses uv
for Python dependency management:
# Generate lock file
uv lock
# Install dependencies
uv sync
# Build Python package
uv build
The project provides several Maven build commands:
-
Validate JSON schemas only:
mvn -Dtest=JsonSchemaValidationTest test
-
Generate Java classes without validation (skip tests):
mvn prepare-package -DskipTests
-
Standard build (validate schemas then generate classes):
mvn package
The following class diagram visualizes the relationships between the different entities in the data archive model. The diagram is automatically generated from the Pydantic models in the codebase.
classDiagram
class Schema {
+str name
+str version
+AnyUrl url
}
class DataArchiveModelCatalog {
%% Catalog of all schemas in the Data Archive Model
+List[Schema] schemas
}
class Deposit {
%% An OAIS Deposit entity representing a submission from a Producer
+oais_base_defs.Identifier identifier
+oais_base_defs.DateTime dateCreated
+Producer producer
+str name
+Optional[Status] status
+List[SIP.SubmissionInformationPackage] sips
}
class SubmissionInformationPackage {
%% An OAIS Submission Information Package (SIP) entity
+oais_base_defs.Identifier identifier
+str name
+Producer producer
+List[IntellectualEntity] intellectualEntities
+Optional[State] state
}
class Producer {
%% An OAIS Producer entity that creates and submits content
+oais_base_defs.Identifier identifier
+oais_base_defs.DateTime dateCreated
+str name
+Optional[str] contact
}
class File {
%% An OAIS File entity representing a digital file
+oais_base_defs.Identifier identifier
+Optional[oais_base_defs.DateTime] dateCreated
+Optional[str] name
+str path
+List[Fixity] fixities
}
class Fixity {
%% An OAIS Fixity entity representing integrity information for a digital file
+oais_base_defs.Identifier identifier
+List[oais_base_defs.Checksum] checksums
}
class Representation {
%% An OAIS Representation entity representing a specific form of an Intellectual Entity
+oais_base_defs.Identifier identifier
+str name
+List[File] files
}
class IntellectualEntity {
%% An OAIS Intellectual Entity representing a conceptual object
+oais_base_defs.Identifier identifier
+List[Representation] representations
}
class OaisBaseDefinitions {
%% Common definitions used across OAIS schemas
}
class Checksum {
+Algorithm algorithm
+str value
}
DataArchiveModelCatalog *-- Schema : contains many
Representation *-- File : contains many
Deposit *-- SubmissionInformationPackage : contains many
SubmissionInformationPackage *-- Producer : contains
IntellectualEntity *-- Representation : contains many
SubmissionInformationPackage *-- IntellectualEntity : contains many
Fixity *-- Checksum : contains many
File *-- Fixity : contains many
Deposit *-- Producer : contains
The class diagram is generated using Python scripts located in the scripts
directory:
scripts/generate_mermaid_diagram.py
: Generates a Mermaid class diagram from Pydantic models in the data_archive packagescripts/update_readme.py
: Updates this README.md file with the generated diagram
To update the diagram:
python scripts/update_readme.py
This will:
- Scan the Pydantic models in the
src/data_archive
directory - Generate a Mermaid class diagram
- Update the diagram in this README.md file between the marker comments
The project uses GitHub Actions for CI. The workflow automatically:
- Sets up the Nix environment shell
- Implements Maven dependency caching for all jobs
- Runs schema validation tests
- Generates Java classes from JSON schemas
- Generates Python models from JSON schemas
- Publishes Java artifacts to GitHub Packages
- Publishes Python packages to TestPyPI