Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
300 changes: 300 additions & 0 deletions docs/source/design/composite-time-series.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,300 @@
#####################
Composite Time Series
#####################

Purpose
=======

It is a challenge for users to identity what the correct authoritative time series is for a given measurement at a location, when there are multiple time series at the same location. Additionally these time series often change over time, either being completely new or changing their interval as newer technologies become available.

Gather an entire Period of Record for the value at a location is also rather difficult. And the POR record and "authoritative timeseries" may be one-in-the same.


Need
====

#. CWMS and Access-2-Water require a simple mechanism to allow users of data to retrieve the Authoritative Period of Record data for a given measurement without having to understand all of the possible component time series that may be involved.
#. Period-of-Record time series *should* not be created by duplicating data from the component time series and merging them into a new one.
#. The naming of the time series should fit within the excepting CWMS Time Series Identifier design and not unreasonably interfere with existing usages.


Caveats
=======

#. It is assumed that CWMS-Vue will, as-always, require updates to handle what is created here.
#. e.g. we're not going to let any current limitations of CWMS-Vue hinder our design.


Proposal
========

Description
-----------

CDA should handle a concept of a "Composite Time Series". Whether a Time Series is considered composite will be determined by a specific element of the Time Series Identifier.
Data Administrators will configure which Time Series (members), and the date-time range there-in, to define the composite time series.
CDA will use this stored information to build the Composite Time Series during a query.

Additional names not used
-------------------------

#. Virtual Time Series
#. Period of Record Time Series

Both names have been discarded. We use "Virtual" in too many other places with a more direct meaning of that word.
For Period-of-Record, while that is the primary use-case, the concept is useful in other situations as well.

Hence generically we have a "composite time series"

Axioms
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to add a comment or remark to a timeseries (e.g. staff gage readings, gate computations, tailwater rating etc...) or to the composite timeseries itself? If not this could be separately managed in a CLOB, but it would be neat if you could optionally comment on the timeseries as you added them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm expecting that when we add the more direct support for extracting the text timeseries along with a value time series that the time series "notes" will just come along.

Alternatively one can also just take the member time series and go retrieve that (definitely not ideal though.)

As for the 2nd part of that. There is a "notes" field for each member.

------

#. Composite Time Series are Irregular
#. The definition of the composite time series is stored within the CWMS database
#. The members of a composite time series define a continuous range

#. The date ranges of members *MUST* not overlap
#. The date ranges of members *MUST* not have any gaps
#. Data may have gaps, an explanation range should be provided.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this user defined? Will it allow for definition of a timeseries that may not have extents that cover that time period (e.g. there's a ~2 month gap between timeseries A that ends at 2014-01-03 12:00 and timeseries B which starts at 2014-03-14- 12:00). What does an explanation range look like (e.g. "no data, start 2014-01-03 12:00, end 2014-03-14- 12:00)? Is that assigned automatically if there is a gap in the timeseries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I definitely did not describe this clearly enough in the sample data.
couple of things

  1. a time series with missing in it still counts as complete

A gap here means that you have an end date for one member and then... and not I've realized if you don't have a defined interval it's hard to determine this.

perhaps that should change to SHOULD since I don't think the system can meaningfully define what a "gap" in member is. Does the next start have to be the smallest time unit after the previous end (e.g. nano seconds), if not what is acceptable?

Here's what I was thinking, how do we handle known gaps in service? be it accidental destruction (2 different SPK/SPN gauges have suffered alcohol related removals from service). One site at SPK is removed during most of a year due to no water and it kept suffering vandalism.

So intent is "there's oddly large amount of missing data, how do we report that."

Copy link
Collaborator

@DanielTOsborne DanielTOsborne May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I was thinking, how do we handle known gaps in service? be it accidental destruction (2 different SPK/SPN gauges have suffered alcohol related removals from service). One site at SPK is removed during most of a year due to no water and it kept suffering vandalism.

So intent is "there's oddly large amount of missing data, how do we report that."

Well, this is something that doesn't currently exist in CWMS at all, as I realized when developing a system to read in punch tapes. There are notes on the tapes for station maintenance, but I have no where to save that information in a meaningful, easy to access way.
I'd argue this is out of scope, since this is a useful feature beyond this discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue this is out of scope, since this is a useful feature beyond this discussion.

It's not well documented but if you write to a text timeseries with a TS ID that matches the data TS ID, that affectively becomes the "NOTES".

CDA itself should be setup to allow those entries and retrieve those entries in the current interface. But as you said, not in scope of this.


#. The members of a composite time measure the same thing. (e.g. all members are Elevation; you *cannot* combine elevation and stage as members.)
#. The interval and duration of each member *MAY* be different.


Time Series Naming
------------------

Option 1
~~~~~~~~

`<Location Id>.<Parameter>.<Parameter Type>.Composite.var.<version>`

+----------------------+------------------------------------------------------------------------------------------------------------------------+
| Element | Description |
+----------------------+------------------------------------------------------------------------------------------------------------------------+
|Location Id |As the normal CWMS TS ID, the location for this measure |
+----------------------+------------------------------------------------------------------------------------------------------------------------+
|Parameter |As the normal CWMS TS ID, the measurement (e.g. Stage, Precip, Elevation, flow, etc) |
+----------------------+------------------------------------------------------------------------------------------------------------------------+
|Parameter Type |As Normal CWMS TS ID, Instantaneous, average, total, etc |
+----------------------+------------------------------------------------------------------------------------------------------------------------+
|Interval -\> Composite| Marker that this time series does not have a fix information and is build of various member time series. |
Copy link
Collaborator

@DanielTOsborne DanielTOsborne May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of the opinion that:

  1. duration MUST be the same for all composite members. If you change the window size of an averaging function, then you are producing different data. It's not the same as simply swapping out a sensor.
  2. interval MUST be the same for all composite members. Trying to retrieve a composite time series where you have no idea what you're going to get is a nightmare to code for. If you want to combine different intervals, create a computation to create the requisite interval data from other intervals.

Tying back into an earlier section, that means the composite doesn't have to be irregular, and could potentially be regular, lrts, or prts depending on the sources.
If all timeseries in a composite are hourly regular, for example, empty values could be generated to fill in the gaps.

Now, that leads to the section I highlighted:
I say make the naming somewhat arbitrary, like it is now. Allow it to operate like an alias, so the user creates the name they want the timeseries to labeled, then say "this is a composite".
For example, that would allow the user to just specify .Composite in the version, and prevent confusion by overloading the other parameters.

<Location Id>.<Parameter>.<Parameter Type>.<Interval>.<Duration>.Composite

If that's not feasible, then perhaps adding to the interval, like lrts did:
Something like 1HourComp:
<Location Id>.<Parameter>.<Parameter Type>.1HourComp.<Duration>.<Version>

Otherwise, how would you specify composite data for different intervals and types, and keep them separate?

At SPK we have period-of-record data like this:
New Bullards Bar.Elev.Inst.1Hour.0.POR
New Bullards Bar.Elev.Inst.~1Day.0.POR

Currently that's a separate timeseries with duplicate data. But it doesn't have to be.
They could be changed internally to be composites, then maintain the same names, and everything else matches the rest of the system (parameter type, interval, duration).

Otherwise, you end up with something like:
New Bullards Bar.Elev.Inst.Composite.0.POR - What is that? Hourly, daily? What if I only want daily data?

New Bullards Bar.Elev.Composite.~1Day.0.POR - Is that averaged data, or instantaneous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interval MUST be the same for all composite members. Trying to retrieve a composite time series where you have no idea what you're going to get is a nightmare to code for. If you want to combine different intervals, create a computation to create the requisite interval data from other intervals.

The intent is to avoid duplicating data while providing a simple name to the entire range of the Period of Record for the measurement at a location (that's where the this started from was the period of record)

That said, I think I agree with the Duration. Or at least that if a duration is specified everything must match. But as a matter of historical record and how things changed, the durations do change and would be indicated in each.

I think one thing to point out, the results of this aren't meant for the entire PoR to be cleaning put into a display, but to provide the data as it was used at that was used at that time. Downstream users, say doing a study, would need to determine how they would interpolate/correct any gaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about

New Bullards Bar.Elev[Composite]...

As you said, it's an alias, that will be processed before further work and we don't use [] for anything so far as I'm aware. Certainly other options.

That said, since it is an alias. maybe we should just add yet-another .

  • New Bullards Bar.Elev.Inst.0.0.POR.Composite basically the full history regardless of interval
  • New Bullards Bar.Elev.Inst.~1Day.0.Calc-val.Composite useful for situations of multiple sensors and setting an "active".

That said, it would be more work to use in say, OpenDCS and any client software, but it is an option. But I do agree that fixing it to one location does limit things that appear rather useful.

+----------------------+------------------------------------------------------------------------------------------------------------------------+
|Duration -\> var |Duration of average or total may change over time with new members, duration will be indicated in the member definition |
+----------------------+------------------------------------------------------------------------------------------------------------------------+
|Version |As Normal CWMS TS ID |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a common version name denote it as authoritative? Or does just the existence of a composite timeseries imply that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a possibility. I had put in a place older "is-authoritative" flag in the composite definition.

Though I agree the version is technically a good place for that, it does seem to get a bit... overused at times.

There are certainly arguments to be me in either case, so we'll wait for commentary from others to tip the scales.

+----------------------+------------------------------------------------------------------------------------------------------------------------+


Option 2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 2 makes more sense to me. I'm unsure how Option 1 would deal with potentially varying parameter types among a set of composite data.

I'm a little curious about the implications of doing something like <Location Id>.<Parameter>.Var.0.0.Composite. That would lose the flexibility of using the version as a sort of unique identifier, but I can't really think of a use case for needing multiple composites for one location/parameter off of the top of my head. Using Option 2's style I'm not sure what I would use for the version besides "Composite" or "POR" -- I think it's probably somewhat rare to have a consistent source for a full set of POR data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @msweier made a fairly decent case for why you might have more than one composite for a location+measure. It does makes sense to me to have a "single authoritative" time series followed by "all data with interval X". Really depends on exactly what you're doing with the data.

Copy link
Collaborator

@DanielTOsborne DanielTOsborne May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I commented elsewhere, I say make it work like aliases, so if you fetch a composite, it checks the composite list first, if not found there, then regular timeseries, or something like that.
Then you can have it arbitrarily named, such as <Location Id>.<Parameter>.Var.0.0.Composite or <Location Id>.<Parameter>.Var.0.0.Composite;Lowess-SPK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that's interesting. technically you wouldn't even need Composite in the name.... will have to think about that. Going to type it up though.

~~~~~~~~

`<Location Id>.<Parameter>.Composite.0.0.<version>`


+------------------------+------------------------------------------------------------------------------------------------------------------------+
| Element | Description |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Location Id |As the normal CWMS TS ID, the location for this measure |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Parameter |As the normal CWMS TS ID, the measurement (e.g. Stage, Precip, Elevation, flow, etc) |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Parameter Type Composite|Marker that this time series does not have a fix information and is build of various member time series. |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Interval -\> 0 |Interval of data elements. may change over time with new members, duration will be indicated in the member definition |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Duration -\> 0 |Duration of average or total. may change over time with new members, duration will be indicated in the member definition|
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Version |As Normal CWMS TS ID |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these options, but it would be nice to differentiate between a POR timeseries that includes all best available intervals (e.g. daily inst, 4 hr, 1 hr, 15 minute) and a POR timeseries that includes the best available on a daily interval (e.g. 8 am inst or daily avg). MVP's merged TS denotes these as ~15Minutes and ~1Day but maybe there's a better way. I'm thinking some way like the USGS makes it easy to pick instantaneous value data vs. daily data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like the type of information to put in the version, if desired.

My assumption with the POR time series was that it should be suitable for "this is the full record as we have it" knowing that over time the official record has improved methods of measurement.

So like the first few decades could be daily instantaneous, and the next decade 12 hours, then 1 hour, then 15 minutes, and maybe things would change to an average or not. But if you go further down in the document you'll see that the returned time series values also includes the members with their definition.

So yes, you could make a composite time series that only included certain intervals and durations, but to the composite system itself it wouldn't care.

That said we could open up the definition to allow the interval and duration to be set, we would then need to decide if that is enforced.

For example:

  1. if the composite is 1Day, do we limit every member to 1Day. (This applies to Local Regular as well)
  2. if the composite is ~1Day, do we limit every member to ~1Day or allow others since ~1Day means most likely 1 day but could be different

I'm not opposed, I don't think that adds too much complexity, but other one of those more feedback from the group would be good type things.

Copy link
Collaborator

@DanielTOsborne DanielTOsborne May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like the type of information to put in the version, if desired.

My assumption with the POR time series was that it should be suitable for "this is the full record as we have it" knowing that over time the official record has improved methods of measurement.

This is completely different from my understanding of what we were going to have for POR. Honestly, I can't think of any use case where having everything jumbled into a single time series is remotely useful. As it is, I have a hard time wanting to classify readings from two different sensor types (e.g. bubbler vs shaft-encoder) into a single POR. It's not the same data. Yes, it represents the same real-word measurement, but how useful is it to have them together? You can't run any worthwhile scientific/mathematical analysis on the data, since difference sensors respond in different ways and can throw off expectations.
Imagine training a ML model off bubbler recorded data, then trying to have it work with shaft-encoder data. It won't return expected results.

Also, what if we're actively recording the same measurement with two different sensors? Which do we put into the POR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't run any worthwhile scientific/mathematical analysis on the data, since difference sensors respond in different ways and can throw off expectations.
Imagine training a ML model off bubbler recorded data, then trying to have it work with shaft-encoder data. It won't return expected results.

not automatically, no.

Also, what if we're actively recording the same measurement with two different sensors? Which do we put into the POR?

For a generic composite time series, whatever you want. For "Period of Record" that's intended to be what was used to make any decisions.


The zero's could also be var


Option 3
~~~~~~~~

`<Location Id>.<Parameter>.<Parameter Type>.<Interval>.<Duration>.Composite`


+------------------------+------------------------------------------------------------------------------------------------------------------------+
| Element | Description |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Location Id |As the normal CWMS TS ID, the location for this measure |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Parameter |As the normal CWMS TS ID, the measurement (e.g. Stage, Precip, Elevation, flow, etc) |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Parameter Type |Marker that this time series does not have a fix information and is build of various member time series. |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Interval |Interval of data elements. may change over time with new members, duration will be indicated in the member definition |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Duration |Duration of average or total. may change over time with new members, duration will be indicated in the member definition|
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Version |Composite or POR ... or check for composite at the front/back? |
+------------------------+------------------------------------------------------------------------------------------------------------------------+

From Daniel

Argument Against: the "Version" field is freeform and we often encode other information in it.
Argument Against above argument: That said, perhaps forcing the version to be "clean" is the right choice here.


Option 4
~~~~~~~~

`<Location Id>.<Parameter>[Composite].<Parameter Type>.<Interval>.<Duration>.<Version>`


+------------------------+------------------------------------------------------------------------------------------------------------------------+
| Element | Description |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Location Id |As the normal CWMS TS ID, the location for this measure |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Parameter |As the normal CWMS TS ID, the measurement (e.g. Stage, Precip, Elevation, flow, etc) |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Parameter Type |Marker that this time series does not have a fix information and is build of various member time series. |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Interval |Interval of data elements. may change over time with new members, duration will be indicated in the member definition |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Duration |Duration of average or total. may change over time with new members, duration will be indicated in the member definition|
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Version |As Normal CWMS TS ID |
+------------------------+------------------------------------------------------------------------------------------------------------------------+


This form with something in [] has been discussed for embedded TimeZone and Offset information into the interval. Arguably this could go in any field.


Option 4
~~~~~~~~

`<Location Id>.<Parameter>.<Parameter Type>.<Interval>.<Duration>.<Version>` and/or arbitrary TS "alias"


+------------------------+------------------------------------------------------------------------------------------------------------------------+
| Element | Description |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Location Id |As the normal CWMS TS ID, the location for this measure |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Parameter |As the normal CWMS TS ID, the measurement (e.g. Stage, Precip, Elevation, flow, etc) |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Parameter Type |Marker that this time series does not have a fix information and is build of various member time series. |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Interval |Interval of data elements. may change over time with new members, duration will be indicated in the member definition |
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Duration |Duration of average or total. may change over time with new members, duration will be indicated in the member definition|
+------------------------+------------------------------------------------------------------------------------------------------------------------+
|Version |As Normal CWMS TS ID |
+------------------------+------------------------------------------------------------------------------------------------------------------------+

However, on request for the timeseries the list of composite time series is consulted and used if present, otherwise passthrough to normal
time series retrieval.


Composite Time Series Definition
================================

.. code-block::jsonc

{
"office": "<string>",
"name": "<ts id name>",
"is-authoritative": true, // or is authoritative. to distinguish between other possible use-cases?
"members": [
{
"time-series-id": "TS ID for this range",
"start": "start date of this", // Inclusive
"end": "end date of this range", // Exclusive
"notes": "text",
}
]
// array above *should* be sorted by start when provided to user.
}


Operations required:

* Create
* Remove member (ts id + range)
* Add member
* List members
* Replace all members?
* Delete


Composite Time Series Response
==============================

.. code-block::jsonc

{
// ... as current TimeSeries JSON
"composite-members-present": [
// member definition from above
]
}


Supported Operations:

* Get, through existing TimeSeries classes.


Storage of member information
================================

#. Store in Clob as we refine the design - cache appropriately in member to avoid any major performance issues.
#. Create appropriate tables once the design is stable - still cache things.

System responsibility for "knowing" to process composite.
=========================================================

Time Series Catalog
-------------------

Time Series Catalog should show composite time series and allow searching by "authoritative"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just FYI: we'd have to teach CWMS-Vue a new Interval definition or a new Parameter Type in order to show up properly. I know you called up inevitable CWMS-Vue updates elsewhere, but wanted to be explicit.


TimeSeries DTO
--------------

Add nullable "members" property.

TimeSeriesDao
-------------

If the system sees the "Composite" marker/determines is composite retrieve the members for the range and build the time series.

.. NOTE::
Considering the user may request the *entire* Period-of-record, this is a good opportunity to see that,
start the retrieval in a job queue, and return a status URL to the user for future download. I have see such mechanism
for bulk data in other systems. Maybe return an "I'm working on it variant" that the controller can know how to format.

Perhaps we do this for data beyond "x amount"?

Error handling and other conditions.
====================================

Versioned (date) time series
----------------------------

It is an error to specify a Version (date) when requesting composite data.

Datum conversions
-----------------

Retrievers of the Period-of-Record *SHOULD* be able to retrieve the data as a single datum. Composite retrieval should respond
as https://github.com/USACE/cwms-data-api/issues/1102 and convert each member as appropriate


On the saving of a composite definition
---------------------------------------

When only a single member is added, the full definition needs to be check to ensure the ranges are still overlapping and continuous.

References
==========

#. https://github.com/USACE/cwms-data-api/discussions/956
#. https://github.com/USACE/cwms-data-api/issues/955
#. https://www.hec.usace.army.mil/confluence/spaces/CWMS/pages/290456000/Virtual+Timeseries
#. https://discourse.hecdev.net/t/period-of-record-timeseries/3859/2
12 changes: 12 additions & 0 deletions docs/source/design/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
################
Design Documents
################

The follow pages formally document current and proposed designs
relating to operations and usage of data.

.. toctree::
:maxdepth: 2
:caption: Introduction

Composite Time Series <./composite-time-series.rst>
3 changes: 2 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@ Welcome to CWMS Data API documentation!
Endpoints <./endpoints/index.rst>
Glossary <./glossary.rst>
FAQ <./faq.rst>
Client Libraries <./libraries.rst>
Client Libraries <./libraries.rst>
Design Documents <./design/index.rst>
3 changes: 3 additions & 0 deletions docs/source/libraries/java.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
####
Java
####
Loading