-
Notifications
You must be signed in to change notification settings - Fork 19
Composite Time Series Design document. #1103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,300 @@ | ||
##################### | ||
Composite Time Series | ||
##################### | ||
|
||
Purpose | ||
======= | ||
|
||
It is a challenge for users to identity what the correct authoritative time series is for a given measurement at a location, when there are multiple time series at the same location. Additionally these time series often change over time, either being completely new or changing their interval as newer technologies become available. | ||
|
||
Gather an entire Period of Record for the value at a location is also rather difficult. And the POR record and "authoritative timeseries" may be one-in-the same. | ||
|
||
|
||
Need | ||
==== | ||
|
||
#. CWMS and Access-2-Water require a simple mechanism to allow users of data to retrieve the Authoritative Period of Record data for a given measurement without having to understand all of the possible component time series that may be involved. | ||
#. Period-of-Record time series *should* not be created by duplicating data from the component time series and merging them into a new one. | ||
#. The naming of the time series should fit within the excepting CWMS Time Series Identifier design and not unreasonably interfere with existing usages. | ||
|
||
|
||
Caveats | ||
======= | ||
|
||
#. It is assumed that CWMS-Vue will, as-always, require updates to handle what is created here. | ||
#. e.g. we're not going to let any current limitations of CWMS-Vue hinder our design. | ||
|
||
|
||
Proposal | ||
======== | ||
|
||
Description | ||
----------- | ||
|
||
CDA should handle a concept of a "Composite Time Series". Whether a Time Series is considered composite will be determined by a specific element of the Time Series Identifier. | ||
Data Administrators will configure which Time Series (members), and the date-time range there-in, to define the composite time series. | ||
CDA will use this stored information to build the Composite Time Series during a query. | ||
|
||
Additional names not used | ||
------------------------- | ||
|
||
#. Virtual Time Series | ||
#. Period of Record Time Series | ||
|
||
Both names have been discarded. We use "Virtual" in too many other places with a more direct meaning of that word. | ||
For Period-of-Record, while that is the primary use-case, the concept is useful in other situations as well. | ||
|
||
Hence generically we have a "composite time series" | ||
|
||
Axioms | ||
------ | ||
|
||
#. Composite Time Series are Irregular | ||
#. The definition of the composite time series is stored within the CWMS database | ||
#. The members of a composite time series define a continuous range | ||
|
||
#. The date ranges of members *MUST* not overlap | ||
#. The date ranges of members *MUST* not have any gaps | ||
#. Data may have gaps, an explanation range should be provided. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this user defined? Will it allow for definition of a timeseries that may not have extents that cover that time period (e.g. there's a ~2 month gap between timeseries A that ends at 2014-01-03 12:00 and timeseries B which starts at 2014-03-14- 12:00). What does an explanation range look like (e.g. "no data, start 2014-01-03 12:00, end 2014-03-14- 12:00)? Is that assigned automatically if there is a gap in the timeseries? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, I definitely did not describe this clearly enough in the sample data.
A gap here means that you have an end date for one member and then... and not I've realized if you don't have a defined interval it's hard to determine this. perhaps that should change to SHOULD since I don't think the system can meaningfully define what a "gap" in member is. Does the next start have to be the smallest time unit after the previous end (e.g. nano seconds), if not what is acceptable? Here's what I was thinking, how do we handle known gaps in service? be it accidental destruction (2 different SPK/SPN gauges have suffered alcohol related removals from service). One site at SPK is removed during most of a year due to no water and it kept suffering vandalism. So intent is "there's oddly large amount of missing data, how do we report that." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Well, this is something that doesn't currently exist in CWMS at all, as I realized when developing a system to read in punch tapes. There are notes on the tapes for station maintenance, but I have no where to save that information in a meaningful, easy to access way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's not well documented but if you write to a text timeseries with a TS ID that matches the data TS ID, that affectively becomes the "NOTES". CDA itself should be setup to allow those entries and retrieve those entries in the current interface. But as you said, not in scope of this. |
||
|
||
#. The members of a composite time measure the same thing. (e.g. all members are Elevation; you *cannot* combine elevation and stage as members.) | ||
#. The interval and duration of each member *MAY* be different. | ||
|
||
|
||
Time Series Naming | ||
------------------ | ||
|
||
Option 1 | ||
~~~~~~~~ | ||
|
||
`<Location Id>.<Parameter>.<Parameter Type>.Composite.var.<version>` | ||
|
||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
| Element | Description | | ||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Location Id |As the normal CWMS TS ID, the location for this measure | | ||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter |As the normal CWMS TS ID, the measurement (e.g. Stage, Precip, Elevation, flow, etc) | | ||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter Type |As Normal CWMS TS ID, Instantaneous, average, total, etc | | ||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Interval -\> Composite| Marker that this time series does not have a fix information and is build of various member time series. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm of the opinion that:
Tying back into an earlier section, that means the composite doesn't have to be irregular, and could potentially be regular, lrts, or prts depending on the sources. Now, that leads to the section I highlighted:
If that's not feasible, then perhaps adding to the interval, like lrts did: Otherwise, how would you specify composite data for different intervals and types, and keep them separate? At SPK we have period-of-record data like this: Currently that's a separate timeseries with duplicate data. But it doesn't have to be. Otherwise, you end up with something like: New Bullards Bar.Elev.Composite.~1Day.0.POR - Is that averaged data, or instantaneous? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The intent is to avoid duplicating data while providing a simple name to the entire range of the Period of Record for the measurement at a location (that's where the this started from was the period of record) That said, I think I agree with the Duration. Or at least that if a duration is specified everything must match. But as a matter of historical record and how things changed, the durations do change and would be indicated in each. I think one thing to point out, the results of this aren't meant for the entire PoR to be cleaning put into a display, but to provide the data as it was used at that was used at that time. Downstream users, say doing a study, would need to determine how they would interpolate/correct any gaps. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what about
As you said, it's an alias, that will be processed before further work and we don't use [] for anything so far as I'm aware. Certainly other options. That said, since it is an alias. maybe we should just add yet-another .
That said, it would be more work to use in say, OpenDCS and any client software, but it is an option. But I do agree that fixing it to one location does limit things that appear rather useful. |
||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Duration -\> var |Duration of average or total may change over time with new members, duration will be indicated in the member definition | | ||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Version |As Normal CWMS TS ID | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could a common version name denote it as authoritative? Or does just the existence of a composite timeseries imply that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a possibility. I had put in a place older "is-authoritative" flag in the composite definition. Though I agree the version is technically a good place for that, it does seem to get a bit... overused at times. There are certainly arguments to be me in either case, so we'll wait for commentary from others to tip the scales. |
||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|
||
|
||
Option 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Option 2 makes more sense to me. I'm unsure how Option 1 would deal with potentially varying parameter types among a set of composite data. I'm a little curious about the implications of doing something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think @msweier made a fairly decent case for why you might have more than one composite for a location+measure. It does makes sense to me to have a "single authoritative" time series followed by "all data with interval X". Really depends on exactly what you're doing with the data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I commented elsewhere, I say make it work like aliases, so if you fetch a composite, it checks the composite list first, if not found there, then regular timeseries, or something like that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh, that's interesting. technically you wouldn't even need Composite in the name.... will have to think about that. Going to type it up though. |
||
~~~~~~~~ | ||
|
||
`<Location Id>.<Parameter>.Composite.0.0.<version>` | ||
|
||
|
||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
| Element | Description | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Location Id |As the normal CWMS TS ID, the location for this measure | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter |As the normal CWMS TS ID, the measurement (e.g. Stage, Precip, Elevation, flow, etc) | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter Type Composite|Marker that this time series does not have a fix information and is build of various member time series. | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Interval -\> 0 |Interval of data elements. may change over time with new members, duration will be indicated in the member definition | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Duration -\> 0 |Duration of average or total. may change over time with new members, duration will be indicated in the member definition| | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Version |As Normal CWMS TS ID | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like these options, but it would be nice to differentiate between a POR timeseries that includes all best available intervals (e.g. daily inst, 4 hr, 1 hr, 15 minute) and a POR timeseries that includes the best available on a daily interval (e.g. 8 am inst or daily avg). MVP's merged TS denotes these as ~15Minutes and ~1Day but maybe there's a better way. I'm thinking some way like the USGS makes it easy to pick instantaneous value data vs. daily data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That seems like the type of information to put in the version, if desired. My assumption with the POR time series was that it should be suitable for "this is the full record as we have it" knowing that over time the official record has improved methods of measurement. So like the first few decades could be daily instantaneous, and the next decade 12 hours, then 1 hour, then 15 minutes, and maybe things would change to an average or not. But if you go further down in the document you'll see that the returned time series values also includes the members with their definition. So yes, you could make a composite time series that only included certain intervals and durations, but to the composite system itself it wouldn't care. That said we could open up the definition to allow the interval and duration to be set, we would then need to decide if that is enforced. For example:
I'm not opposed, I don't think that adds too much complexity, but other one of those more feedback from the group would be good type things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is completely different from my understanding of what we were going to have for POR. Honestly, I can't think of any use case where having everything jumbled into a single time series is remotely useful. As it is, I have a hard time wanting to classify readings from two different sensor types (e.g. bubbler vs shaft-encoder) into a single POR. It's not the same data. Yes, it represents the same real-word measurement, but how useful is it to have them together? You can't run any worthwhile scientific/mathematical analysis on the data, since difference sensors respond in different ways and can throw off expectations. Also, what if we're actively recording the same measurement with two different sensors? Which do we put into the POR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
not automatically, no.
For a generic composite time series, whatever you want. For "Period of Record" that's intended to be what was used to make any decisions. |
||
|
||
The zero's could also be var | ||
|
||
|
||
Option 3 | ||
~~~~~~~~ | ||
|
||
`<Location Id>.<Parameter>.<Parameter Type>.<Interval>.<Duration>.Composite` | ||
|
||
|
||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
| Element | Description | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Location Id |As the normal CWMS TS ID, the location for this measure | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter |As the normal CWMS TS ID, the measurement (e.g. Stage, Precip, Elevation, flow, etc) | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter Type |Marker that this time series does not have a fix information and is build of various member time series. | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Interval |Interval of data elements. may change over time with new members, duration will be indicated in the member definition | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Duration |Duration of average or total. may change over time with new members, duration will be indicated in the member definition| | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Version |Composite or POR ... or check for composite at the front/back? | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|
||
From Daniel | ||
|
||
Argument Against: the "Version" field is freeform and we often encode other information in it. | ||
Argument Against above argument: That said, perhaps forcing the version to be "clean" is the right choice here. | ||
|
||
|
||
Option 4 | ||
~~~~~~~~ | ||
|
||
`<Location Id>.<Parameter>[Composite].<Parameter Type>.<Interval>.<Duration>.<Version>` | ||
|
||
|
||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
| Element | Description | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Location Id |As the normal CWMS TS ID, the location for this measure | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter |As the normal CWMS TS ID, the measurement (e.g. Stage, Precip, Elevation, flow, etc) | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter Type |Marker that this time series does not have a fix information and is build of various member time series. | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Interval |Interval of data elements. may change over time with new members, duration will be indicated in the member definition | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Duration |Duration of average or total. may change over time with new members, duration will be indicated in the member definition| | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Version |As Normal CWMS TS ID | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|
||
|
||
This form with something in [] has been discussed for embedded TimeZone and Offset information into the interval. Arguably this could go in any field. | ||
|
||
|
||
Option 4 | ||
~~~~~~~~ | ||
|
||
`<Location Id>.<Parameter>.<Parameter Type>.<Interval>.<Duration>.<Version>` and/or arbitrary TS "alias" | ||
|
||
|
||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
| Element | Description | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Location Id |As the normal CWMS TS ID, the location for this measure | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter |As the normal CWMS TS ID, the measurement (e.g. Stage, Precip, Elevation, flow, etc) | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter Type |Marker that this time series does not have a fix information and is build of various member time series. | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Interval |Interval of data elements. may change over time with new members, duration will be indicated in the member definition | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Duration |Duration of average or total. may change over time with new members, duration will be indicated in the member definition| | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Version |As Normal CWMS TS ID | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|
||
However, on request for the timeseries the list of composite time series is consulted and used if present, otherwise passthrough to normal | ||
time series retrieval. | ||
|
||
|
||
Composite Time Series Definition | ||
================================ | ||
|
||
.. code-block::jsonc | ||
|
||
{ | ||
"office": "<string>", | ||
"name": "<ts id name>", | ||
"is-authoritative": true, // or is authoritative. to distinguish between other possible use-cases? | ||
"members": [ | ||
{ | ||
"time-series-id": "TS ID for this range", | ||
"start": "start date of this", // Inclusive | ||
"end": "end date of this range", // Exclusive | ||
"notes": "text", | ||
} | ||
] | ||
// array above *should* be sorted by start when provided to user. | ||
} | ||
|
||
|
||
Operations required: | ||
|
||
* Create | ||
* Remove member (ts id + range) | ||
* Add member | ||
* List members | ||
* Replace all members? | ||
* Delete | ||
|
||
|
||
Composite Time Series Response | ||
============================== | ||
|
||
.. code-block::jsonc | ||
|
||
{ | ||
// ... as current TimeSeries JSON | ||
"composite-members-present": [ | ||
// member definition from above | ||
] | ||
} | ||
|
||
|
||
Supported Operations: | ||
|
||
* Get, through existing TimeSeries classes. | ||
|
||
|
||
Storage of member information | ||
================================ | ||
|
||
#. Store in Clob as we refine the design - cache appropriately in member to avoid any major performance issues. | ||
#. Create appropriate tables once the design is stable - still cache things. | ||
|
||
System responsibility for "knowing" to process composite. | ||
========================================================= | ||
|
||
Time Series Catalog | ||
------------------- | ||
|
||
Time Series Catalog should show composite time series and allow searching by "authoritative" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just FYI: we'd have to teach CWMS-Vue a new Interval definition or a new Parameter Type in order to show up properly. I know you called up inevitable CWMS-Vue updates elsewhere, but wanted to be explicit. |
||
|
||
TimeSeries DTO | ||
-------------- | ||
|
||
Add nullable "members" property. | ||
|
||
TimeSeriesDao | ||
------------- | ||
|
||
If the system sees the "Composite" marker/determines is composite retrieve the members for the range and build the time series. | ||
|
||
.. NOTE:: | ||
Considering the user may request the *entire* Period-of-record, this is a good opportunity to see that, | ||
start the retrieval in a job queue, and return a status URL to the user for future download. I have see such mechanism | ||
for bulk data in other systems. Maybe return an "I'm working on it variant" that the controller can know how to format. | ||
|
||
Perhaps we do this for data beyond "x amount"? | ||
|
||
Error handling and other conditions. | ||
==================================== | ||
|
||
Versioned (date) time series | ||
---------------------------- | ||
|
||
It is an error to specify a Version (date) when requesting composite data. | ||
|
||
Datum conversions | ||
----------------- | ||
|
||
Retrievers of the Period-of-Record *SHOULD* be able to retrieve the data as a single datum. Composite retrieval should respond | ||
as https://github.com/USACE/cwms-data-api/issues/1102 and convert each member as appropriate | ||
|
||
|
||
On the saving of a composite definition | ||
--------------------------------------- | ||
|
||
When only a single member is added, the full definition needs to be check to ensure the ranges are still overlapping and continuous. | ||
|
||
References | ||
========== | ||
|
||
#. https://github.com/USACE/cwms-data-api/discussions/956 | ||
#. https://github.com/USACE/cwms-data-api/issues/955 | ||
#. https://www.hec.usace.army.mil/confluence/spaces/CWMS/pages/290456000/Virtual+Timeseries | ||
#. https://discourse.hecdev.net/t/period-of-record-timeseries/3859/2 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
################ | ||
Design Documents | ||
################ | ||
|
||
The follow pages formally document current and proposed designs | ||
relating to operations and usage of data. | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:caption: Introduction | ||
|
||
Composite Time Series <./composite-time-series.rst> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
#### | ||
Java | ||
#### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to add a comment or remark to a timeseries (e.g. staff gage readings, gate computations, tailwater rating etc...) or to the composite timeseries itself? If not this could be separately managed in a CLOB, but it would be neat if you could optionally comment on the timeseries as you added them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm expecting that when we add the more direct support for extracting the text timeseries along with a value time series that the time series "notes" will just come along.
Alternatively one can also just take the member time series and go retrieve that (definitely not ideal though.)
As for the 2nd part of that. There is a "notes" field for each member.