This page explains the design, architecture and the implementation of hydrus toolkit along with a few use cases for the same. Also the interactions and internals of a smart client (hydra-agent) connecting to the server are considered.
To understand how hydrus represents REST resources and how the developer is helped to work with Hydra, it is possible to start from thinking at Hydra as generic framework that describes REST API resources to make data exchanges automated.
Instances belonging to a Resource are named Item
s in hydrus. It is possible to perform HTTP operations over Item
s. At a slightly more abstract layer, the REST Resource is of a kind of an hydra:Resource
, all the instances of the same resource are members of an hydra:Collection
. As Hydra inherits from RDF, thanks to the framework it is possible to represent the API as a RDF graph.
hydrus allows the developer to take advantage of this powerful description by abstracting away the complexity of RDF and to work on the REST interface layer.
hydra-agent interacts with one or more hydrus instances to represent and navigate resources for the sake of data consumption. The client-side tools in the ecosystem are basically any client that complies with Hydra's specs, starting from the official Typescript implementation Heracles.ts and the python-hydra-agent.
The tools in the ecosystem works on top of a distributed architecture that is described below from its foundationals classes in the ORM to the interface layer.
- RDF and Hydra
- hydrus-based cloud system
- hydrus full stack
- Multi-layered Database Design
- Data flow
- Use cases
For a short overview of RDF and Hydra see Home.
hydrus servers are highly decoupled web servers that allows installation of multiple services in parallel. This is possible by-design as every hydrus instance is automatically querable by Hydra smart client (e.g. python-hydra-agent). An hydrus system can be composed of single server or a multiplicity. Whatever is the system's layout, a superuser/developer that carries on the activities of engineering and developing the system can manage access privileges to the APIs in the system. External smart clients can query the APIs in the system, according to the privileges defined by the superuser. Here a simple diagram of a cloud deployment with three hydrus "module-servers" and three external smart clients:
The different hydrus "modules" that build up an hydrus cloud deployment are designed to be highly decoupled Hydra-aware APIs. Design of the APIs follows the Hydra draft so that smart clients querying capabilities can be deployed on the hydrus-powered services. Here an example of a Hydra network in a simple diagram:
The different hydrus instances-servers are designed in the same cloud, any Hydra-aware client with the right privileges can access the ApiDoc
and the data in the servers and build its own representation of the data cloud. hydrus instances may or may not have attached a client as well, to provide routing or connectivity to data stored in another instance.
Data storage capabilities are provided both in the server (hydrus) and in the smart client (hydra-agent) for different puproses:
- hydrus and its database (the grey box in figure above) hold what we can call the data itsef. The instances of the resources the system is serving. The
ApiDoc
is a JSON-LD string published by the server that describes classes of resources and their relationships (intsances' metadata or schema). This database is a relational database with tables for every class, the schema of the database is generated by the parsing of theApiDoc
. - hydra-agent (the green box) and its datastore (the red component) hold the (meta-)data necessary for the client to know where to access and fetch the data. Initially, the datastore is a graph-store in which the resources classes and their relationships are stored as a graph. As the smart client discovers the servers in the network, the graph is enriched by the resources found in other
ApiDoc
(metadata is stored by parsing documentation strings). When the smart client starts fetching instances it can add to the graph the relationships between instances as described by their metadata, acting as a store for triples.
Annotations: the complete data of every instance reached by the client is still only in the server, the client holds its own representation of the state of the metadata. This creates great challenges in terms of data integrity and representations' synchronization. For an overview of the challenge of distributed data networks see here. The Hydra ecosystem does not aim to develop a solution for distributed storage but to implement the W3C Hydra Draft, focusing on defining tools for servers-clients or also clients-clients interactions via HTTP leveraging the REST paradigm.
The design of both database and datastore takes into account some of the different layers of representations possible using RDF. This multi-layered data layout tries to give tools to fashion representations using metadata and data concepts as useful abstractions.
Typically, statements (triples) are stored in the Graph according to 4 different types of layers. These layers make up the Knowledge Base that the REST layer queries.
NOTE: for the sake of this text, the following tuple of words are synonims:
- statement is triple
- predicate is property
Entities and properties are assigned to different layers according to their level of abstraction. The most abstract level considered is the one most related to generic most popular RDF ontologies/vocabularies. Getting closer to the REST interface, the levels become less abstract until it is possible to represent relationships between instances or proper objects. This layout is closed, as in any definite Tree-like representation, by terminals or values that store the quantitative values themselves (strings, numbers, any data types). We call properties that relate "classes" to "classes" (like the ones at the most abstract level) as AbstractProperty
and the others (relating less abstract kind of entities) as InstanceProperty
.
Class
>> Property
>> Class
[GraphCAC
]
A statement that links two abstract classes is a "CAC" statement. This is the most abstract level of relationship stored. Two classes are related by a AbstractProperty
that describes how they relate.
For example:
- the class of
Fish
es relates with the propertyliveHabitat
to the classWateryHabitat
. - Furthermore, if we walk up the the hierarchy in which this relation may be included, the
WateryHabitat
could be in a relation ofrdf:SubclassOf
withHabitat
as a generic class for all the habitats. - Furthermore,
HighPressureSubMarineHabitat
could be ardf:subClass
ofMarineHabitat
that is itself a subclass ofWateryHabitat
, and so on. The one above are all considered asAbstractProperty
for the sake of storing them in hydrus' datastore.
This is a generic overview of how RDF works to relate classes of objects. This logic works also with instances of objects (the fish named Joy is of kind Acanthurus coeruleus); also families and kinds of objects can be represented as classes. Very generic kind of classes (like classes of relations) are described in vocabularies called "upper-ontologies".
Paragraphs below describe less abstract statements.
Resource
>> Property
>> Class
[GraphIAC
]
A statement that links a Resource to an abstract class is a "IAC" statement. A Resource can be also seen as an instance representing a collection of instances (not a class in the abstract, but a more concrete set/group of objects). In the REST layer a Resource is addrressed as Items
. This kind of entity relates to an abstract class as the ones described in the "CAC" group. This class of statements are stored in the "Graph IAC".
For example:
- the Resource that is the collection of
Fish
and the Resource that is the collection ofMollusca
have both a propertyliveHabitat
that points toWateryHabitat
Resource
>> Property
>> Resource
[GraphIII
]
A statement that relates two instances or collections of items is a "III" statement. In this layer and in the ones described below, properties are meant to be of kind InstanceProperty
.
For example:
- an instance of Resource/Class
Fish
can have a propertysameHabitat
pointing to an instance of Resource/ClassMollusca
because they both lives in the Sargasso Sea. In other words, the fish Joy has the same habitat as the mollusca Rachel. The Knowledge Base express its power as in the same datastore Joy may also be part of multiple statements, like: Joy is of a kind Teleost (an infraclass/family of fishes according to marine biologists).
Resource
>> Property
>> Value
[GraphIIT
]
A statement that relates an instance to a constant is an "IIT" statement. A Value
can be a string or a number or any other kind of RDF-supported data types. Usually a value is also a Terminal
in the sense that it is an entity that state itself and is not related furthermore to an object. Some values are not terminals as may contain a unit of measurement ("2.3 kilos"), in that case the unit of measurement can be itself semantically linked to an entity outside of the instance ("kilos" can be semantically linked to the vocabulary describing Weights and Measures).
Below is the schema diagram for hydrus database design, instances are stored in tables representing their hierarchical representation:
(WIP: add diagram for data representation used in python-hydra-agent
)
...
Here is a small illustration as to how data flows in hydrus.
Hydra API Documentation to server endpoints:
RDF/OWL declarations to server endpoints:
This section explains hydrus's design and a use case for the same. For the demonstration, the server has the Subsystems and Spacecraft vocabularies.
Here is an example of a system used to serve data using the components of hydrus:
A simple example explaining the use of the above architecture would be:
- User types in the query “What is the cost of a Thermal Subsystem?”.
- Middleware uses NLP to extract keywords "Thermal Subsystem" and "cost" and maps it to the Hydra instances and properties present at the server.
- Middleware passes these instances and the underlying query to the client.
- Client models a request and uses the API endpoints to extract the given information from the server.
- Server replies with the required value.
- Client serves data to the User.