-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review latest workflow schemas for EOEPCA+ #233
Comments
After initial review with Angelos. We agreed that we should use OGC API Records to
This is important because it means (as Richard suggested)
E.g.
|
It is important that this approach also supports the types of workflows identified by @edobrowolska in Ewelina, identified a number of scripts that can be considered workflows. E.g These scripts might be regarded as unstructured workflows, perhaps like a Jupyter Notebook. It may be that at some point they might be converted to a more formal workflow like for instance a CWL file (OGC API Processes). However there is no reason why these unstructured scripts cannot be used and supported by EarthCODE using the above approach. E.g.
or
The above syntax may not be 100% correct, but hopefully, it demonstrates what is possible with OGC API Records. It may be that |
We need to confirm the schema that will be used for validation of OGC API Records Hopefully we can test some of the above examples using the correct schema. |
I just reviewed this approach with EOEPCA+ and will link then to this user story to help clarify our requirements. I have asked for EOEPCA+ guidance on how to validate schema compliance? This seems quite complicated. The online schemas to not seem to cope with $ref instances and there seem to be lots for OGC API Records. I have looked at command lined solutions like Polyglottal JSON Schema Validator and these seem to struggle too. For above I used (schemas) |
I also got some useful strategic possibilities from EOEPCA E.g. from Gérald FENOY |
Angelos is now back and will ask Peter Vretanos from the OGC API Records SWG about the tool he used to validate all the examples in the specification. |
The link above provided by @kalxas is very helpful and provides a very good starting point to move forwards for EarthCODE. |
I have now successfully validated the OpenEO example above using a command line tool. In summary
Other Examples OpenEO links": [ OGC API Processes links": [ Python Processes links": [ Jupyter Notebook Processes links": [ |
This now works with the following examples Test using check-jsonschema https://schemas.wmo.int/wcmp/2.0.0/standard/wcmp-2.0.0.pdf sudo pip3 install check-jsonschema Example OGC API Record OpenEO Example |
@GarinSmith please check the new EOEPCA metadata schema here: |
@kalxas i just looked, |
This is the bundled file: |
Thanks, I have used this new schema, but I am having trouble converting the EOPECA example to one that works
|
It would also help to have a reference guide such as |
If I understand this correctly it would be great to prepare a PR towards https://github.com/ESA-EarthCODE/open-science-catalog-validation to add the schema and any other validation artifacts extending the validation command The consecutive task would be to add a first example workflow to https://github.com/ESA-EarthCODE/open-science-catalog-metadata and extend the validation action with the extended command. |
@GarinSmith I propose we follow the same pattern, using record.properties.type to specify workflows. There are 2 options:
|
@kalxas there are some issues with the schema https://github.com/EOEPCA/metadata-profile/blob/master/schemas/resource.json, like the given URL and the $id in the schema are not consistent (e.g. yaml file extension instead of .json) and some other things that the Node-based validator (AJV) we are using is complaining about (seems to be stricter than the Python one check-jsonschema). What's the best way to iterate on this schema to make it work for OSC? |
Yeah, the schema should be hosted in a way that the $id matches the actual URL and ideally it would also return a JSON media type instead of text/plain. Maybe host it through GitHub Pages? AJV also complains about the keyword "example" which should be "examples" (and an array). The format "url" should probably be "uri"?! See https://json-schema.org/understanding-json-schema/reference/string#resource-identifiers Here's a fixed version of the schema with instructions (see bash.sh) how to run it with ajv-cli: |
The schema was generated from the OGC API - Records yaml schema using the tools described here: https://github.com/EOEPCA/metadata-profile/tree/master/schemas |
Yeah OpenAPI (3.0) and JSON Schema can't be converted 1:1, it seems the tool doesn't do the conversion very strictly. Shall we propose the changes from my Gist to the upstream schema then? The $id is of course always something that needs to be updated manually after conversion, I guess. |
thanks @m-mohr , opened an upstream pull request |
Thanks @kalxas, there was a second issue:
Is this also something that can be fixed upstream? |
I will check with the SWG |
@kalxas any updates yet? |
Shouldn't the example provided above (i.e. https://schemas.wmo.int/wcmp/2.0.0/examples/de-dwd.surface-weather-observations-realtime.json ) include a conformance class for (static) OGC API - Records? |
no further updates yet |
I think OpenAPI allows for both example (string/object/array) and examples (array) |
That's correct, but then it needs to be converted to examples for JSON Schema. So why not just use examples also in OpenAPI if that's the common denominator? Makes it easier to switch between JSON Schema and OpenAPI. The question with the conformance classes is also still open. |
@m-mohr regarding conformance classes: |
@kalax How can I resolve in a client which conformance classes apply including all depencies? My client can't know all conformance classes of the whole world and it's dependencies, but should be able validate whether something is a valid Record. It doesn't know whether it's Records though because the conformace class is not listed. So it would reject the file as invalid Records. So I think the dependencies need to be listed. I would rather ask: Why would you not list it? |
Just out of the OGC API Records meeting, we have covered all the pending issues. Changes to be applied shortly as pull requests. |
Can you remind me what the solution for the example vs examples issue was? @kalxas Migration to OpenAPI 3.1? |
Yes, there is a plan to migrate OGC APIs to OpenAPI 3.1 |
The example(s) have been removed from the schema (also upstream) |
Updated PR for records validation in ESA-EarthCODE/open-science-catalog-validation#12 |
Records validation now in: ESA-EarthCODE/open-science-catalog-validation#16 |
EOEPCA+ will look to define a schema for
See - System level
https://github.com/orgs/EOEPCA/projects/4/views/13?sliceBy%5Bvalue%5D=Resource+Discovery&pane=issue&itemId=60227850
See - BB level
https://github.com/orgs/EOEPCA/projects/7/views/1?filterQuery=workflow&pane=issue&itemId=69113579
Garin to discuss with @rconway and Angelos whose GitHub id I cannot find yet.
Garin also to confirm that EOEPCA+ Catalogue can ingest and discover OGC API Records. We believe that we will need to write a new Front End in the portal to Find and Access OGC API Records in the same way we do for STAC.
pycsw supports OGC API - Records - Part 1: Core, version 1.0 by default.
See https://docs.pycsw.org/en/latest/oarec-support.html
Angelos has confirmed that OSC currently has support to ingest and discover OGC API Records.
Angelos noted that "Open Science Catalog is 2 versions behind, several fixes and new features have been implemented in the last 6 months or so"
Angelos is off from 9 Aug to 9 Sep, but I will meet him tomorrow for his advice on
EOEPCA+ will look to define a schema for
Please also refer to
https://github.com/orgs/ESA-EarthCODE/projects/5/views/8?pane=issue&itemId=72092886
The text was updated successfully, but these errors were encountered: