-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to publish Open Data from MELODIES #3
Comments
The first three goals are all at dataset level. For that we should at a minimum use the established W3C vocabularies DCAT and VoID for describing datasets (I don't think schema.org will be that useful currently, but I may be wrong). DCAT and VoID have some overlap in general metadata (importing other vocabularies like DC and FOAF). The difference between them is that DCAT is for arbitrary datasets (actual dataset can be random files), while VoID is specifically meant for LOD datasets where the whole dataset consists of RDF triples. This is made obvious by the fact that you should provide a SPARQL endpoint within VoID for the dataset, and may provide statistics of triples (counts, basically dataset size). With VoID an OpenSearch.xml description can also be linked to, but this is only for searching within the dataset with free text search. Some facts:
About metadata hosting, in the end the datasets will be hosted on some server anyway, providing things like a GeoSPARQL endpoint, and some way for accessing raster data in an intelligent way with OPeNDAP or WCS for example (possibly linked to via RDF somehow). And on that server, the dataset probably has its own URL where the metadata can be stored alongside as well. VoID describes three ways of doing just that. I think that's a minimum. The next step would be to point catalogs to this metadata so they can harvest it. However I don't know of any catalog which has what we want. As far as I can see only the closed ones from NASA for example have rich query capabilities like bounding box and time range searching. We may have some luck in adding a bit more temporal and geospatial sauce to CKAN (which is the software used for catalog portals like datahub.io) in case the available plugins are not enough. That could be one of the Melodies contributions on the software side. Things I haven't discussed here are how to model datasets themselves, how observations are linked to the metadata, and how to integrate raster data. I think this has to be cleared first before thinking about how to expose it in graphical portals. |
+1 for DCAT ... just please don't ignore the "Catalog" element however DCAT capabilities for geo are quite feeble ... |
Thanks Pedro - what do you think MELODIES should do for a catalogue? Should we expose our own "demonstrator" catalogue (e.g. with OpenSearch Geo/time interfaces)? Or is there another catalogue we could plug into (e.g. on Terradue's platform) that we could use to demonstrate what we have been doing? |
Currently each partner has data repositories and catalogue services as part of the cloud platform baseline services and have been exploited in developing and integrating their MELODIES services. What we are missing is a public top level catalogue that could aggregate/expose particular collections as Open Data. This study in WP3 will be very useful to frame the metadata model. Among others, it will help us to check the feasibility of DCAT to improve our catalogue solution. |
Currently we’re thinking of publishing the MELODIES catalogue as an RDF document using DCAT (and maybe VoID). We think that CKAN instances can harvest this. Would this work for Terradue? How might we include OpenSearch capabilities? |
I think we should split this issue up as it covers too much. So I think in general we have these topics:
Working on these separately and doing some discussion cross-referencing is better I think than having a massive thread covering everything. |
(I created a new issue #9 to discuss where MELODIES data should be published.) |
This is to start a discussion on how we should publish data from the MELODIES project as Open Data. The ideal situation is to publish five-star linked open data where everything is described as RDF, with links to other datasets and vocabularies.
The current list of open data planned from the MELODIES project is on the EMDESK Wiki (perhaps we should move it to GitHub? -> see #5), although we should consider other datasets too in an attempt to identify generically-useful methods.
We consider three levels of information:
Our goals are:
How can we achieve the above? Questions include:
Discussion is welcome!
The text was updated successfully, but these errors were encountered: