-
Notifications
You must be signed in to change notification settings - Fork 3
Epic: Track changes to a data series and push the date of the latest change to CKAN #298
Comments
I'm a bit unclear why wouldn't you manage all of this info directly in CKAN? It already has the capability to store all this kind of info and it saves you having to reinvent the wheel by adding support for this in CPS (and then pushing it back across into CKAN). |
We use CPS to import and normalize data and maintain referential integrity. Since at least some of the info has to be maintained on the CPS side, our data managers feel it would be easier to manage all of it there. This is just for those datasets that we curate, not user contributed datasets which live solely on CKAN. |
@cjhendrix I guess the question is why you couldn't maintain all the info on the CKAN side here based on DRY principles? Generally, I think it would be really useful (for me) to understand a bit more about the overall architecture especially of CPS to understand what is being done where and how as I can then offer more useful input :-) |
Note to Sam. Understood that this one will likely carry over multiple sprints given your availability. |
The biggest difficulty I see here is that CPS does not know about the curated datasets. Instead, the curated datasets know about CPS. If we add some kind of mapping, allowing CPS to know which curated datasets to update when some data (or metadata) changes are detected, we still have 2 places to maintain. If we add a new indicator, we have to create it in CPS, create the curated dataset, and they both must know about each other. So we don't follow the DRY principle, and I am not sure this will be simpler for the data team. The gain here would be that once this is set up, the updates should be replicated. I think we should have a call dedicated to this topic. |
So after discussion, here is the plan : There is a 1 to 1 relationship between dataseries and ckan datasets. So if we detect a change in the data or metadata for a dataserie, we can push it to the dataset.
What I can do already is the following :
|
@seustachi it would be super useful to get a bit of a diagram here to understand what is going on - as mentioned you'll want to be careful about not ending up with your authoratative metadata in 2 places (and getting stuff out of sync). |
@seustachi The key thing we need to urgently solve is the high value test case listed in the original issue above. If I understand your last comment above, it sounds like you are putting that one as secondary. Happy to discuss, but I think you need to focus your effort on that one. |
@cjhendrix I don't put it as secondary priority. To detect a change related to a dataserie is a prerequisite. Then we will be able to push information to CKAN. |
Ok, thanks for the clarification. |
So, we agreed that :
LastUpdateDate changes only if at least one vale was added or updated |
List of the extras keys we wat to use : "dataset_source" for the sourceName
|
Format of the action we want to use is documented here : |
@cjhendrix @alexandru-m-g Do we keep a human readable title (title_with_underscore___sourceCode) or do we want (indTypeCode_SourceCode) I think I remember the CJ prefered the human readable. If we do that, we have to manage the title in CPS (to be able to push updates). Is it what we want ? |
Please, when in doubt about any names, favor human readable over anything else and url slug over human readable. |
@seustachi It's the former, for example: https://data.hdx.rwlabs.org/dataset/proportion_of_the_population_using_improved_sanitation_facilities___mdgs Alex is making the change in sprint 46 (2 week sprint starting 5 Jan): OCHA-DAP/hdx-ckan#1771 As for managing the title in CPS, that should be fine. The only thing we shouldn't manage is the "name", which is used for the URL. |
What we want now is to trigger the metadata update is a new indicator value is added or an existing one changed, because we need to change the range of values dates |
And we also want to update the date of the last "update" of the dataset. See with @alexandru-m-g if we store it in dataset or resource. This is a new metadata, update triggered when an update to the data is done |
which there is some data for the dataserie
to appear under "name" instead of "id"
@cjhendrix Moved to sprint 48. Even if we started to implement this epic in sprint 46, and some work was also done on sprint 47, some sub-tasks are still pending and planned for sprint 48 or later |
The goal is to allow all the information about a ckan indicator (which is simply a ckan dataset that is coming from CPS) to be maintained in one place: CPS. CPS would then have the ability to push changes to this information (let's call it Ancillary Indicator Information, AI2) to CKAN via CKAN's action API.
The goal of this epic is to set up the framework for this using a high value test case, described below.
Consider all the indicators returned from this search: https://data.hdx.rwlabs.org/dataset?q=fts+cross-appeal Note that the "Updated By" date for all of them is July 7, which is the date when the ckan datasets were created. However the data on CPS has been updated at least weekly since then, but CKAN has no way of knowing this. This epic will result in these dates being updated by CPS whenever a change is made to the data series. Later we will expand this approach to allow all of the AI2 to be managed in CPS.
The list of AI2 to be managed by CPS will ultimately include:
The text was updated successfully, but these errors were encountered: