Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple providers for one source leads to issues with data exports #8

Open
caparker opened this issue Apr 13, 2023 · 0 comments
Open

Comments

@caparker
Copy link
Collaborator

Because of the way that we are linking locations (nodes) to providers and then using that information in the data export paths we are seeing situations where a location is flipping back and forth between different 'folders' in our export file structure.

A good example of this can be seen in location 8047. This location is part of the AirNow/StateAir networks and is ingested (historically) under both. Because of the way that data is updated for realtime ingestion we are seeing the provider move back and forth between AirNow and StateAir, so, for a given month, this is what the exports look like

                                               key                                                |    day     |          exported_on          
-------------------------------------------------------------------------------------------------------+------------+-------------------------------
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220101   | 2022-01-01 | 2022-06-06 05:53:14.690365+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220102   | 2022-01-02 | 2022-06-06 05:56:20.485503+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220103   | 2022-01-03 | 2022-06-06 05:57:45.054661+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220104   | 2022-01-04 | 2022-06-06 05:58:09.662885+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220105   | 2022-01-05 | 2022-06-06 06:01:43.863835+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220106   | 2022-01-06 | 2022-06-06 06:04:19.938678+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220107   | 2022-01-07 | 2022-06-06 06:01:54.443595+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220108   | 2022-01-08 | 2022-06-06 06:05:58.245287+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220109   | 2022-01-09 | 2022-06-06 06:08:31.571867+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220110   | 2022-01-10 | 2022-06-06 06:12:29.719793+00
 records/csv.gz/provider=stateair/country=sd/locationid=8047/year=2022/month=01/location-8047-20220111 | 2022-01-11 | 2022-06-06 06:16:20.613882+00
 records/csv.gz/provider=stateair/country=sd/locationid=8047/year=2022/month=01/location-8047-20220112 | 2022-01-12 | 2022-06-06 06:16:39.314988+00
 records/csv.gz/provider=stateair/country=sd/locationid=8047/year=2022/month=01/location-8047-20220113 | 2022-01-13 | 2022-06-06 06:17:30.131499+00
 records/csv.gz/provider=stateair/country=sd/locationid=8047/year=2022/month=01/location-8047-20220114 | 2022-01-14 | 2022-06-06 06:19:12.926728+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220115   | 2022-01-15 | 2022-06-06 06:21:51.057098+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220116   | 2022-01-16 | 2022-06-06 06:24:40.222024+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220117   | 2022-01-17 | 2022-06-06 06:24:01.474895+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220118   | 2022-01-18 | 2022-06-06 06:26:04.872786+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220119   | 2022-01-19 | 2022-06-06 06:29:09.344685+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220120   | 2022-01-20 | 2022-06-06 06:27:48.974003+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220121   | 2022-01-21 | 2022-06-06 06:30:50.746665+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220122   | 2022-01-22 | 2022-06-06 06:33:38.278047+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220123   | 2022-01-23 | 2022-06-06 06:32:24.575535+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220124   | 2022-01-24 | 2022-06-06 06:35:35.183232+00
 records/csv.gz/provider=stateair/country=sd/locationid=8047/year=2022/month=01/location-8047-20220125 | 2022-01-25 | 2022-05-26 23:53:19.242192+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220126   | 2022-01-26 | 2022-06-06 06:40:01.519439+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220127   | 2022-01-27 | 2022-06-06 06:39:21.093998+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220128   | 2022-01-28 | 2022-06-06 06:41:35.865619+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220129   | 2022-01-29 | 2022-06-06 06:44:53.979586+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220130   | 2022-01-30 | 2022-06-06 06:43:13.894873+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220131   | 2022-01-31 | 2022-06-06 06:46:20.233677+00

We can see this in the sensor_nodes_history table. Here is a summary of the last 30 days which shows how many times the node was updated and which source_name it was updated to. A day with two sources is a day that it switched at least once

  created   |               sources                | n  
------------+--------------------------------------+----
 2023-04-11 | AirNow                               | 48
 2023-04-10 | AirNow, StateAir_KhartoumResidential | 71
 2023-04-09 | AirNow, StateAir_KhartoumResidential | 71
 2023-04-08 | AirNow, StateAir_KhartoumResidential | 70
 2023-04-07 | AirNow, StateAir_KhartoumResidential | 69
 2023-04-06 | AirNow, StateAir_KhartoumResidential | 46
 2023-04-05 | AirNow, StateAir_KhartoumResidential | 66
 2023-04-04 | AirNow, StateAir_KhartoumResidential | 74
 2023-04-03 | AirNow, StateAir_KhartoumResidential | 72
 2023-04-02 | AirNow                               | 72
 2023-04-01 | AirNow                               | 72
 2023-03-31 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-30 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-29 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-28 | AirNow, StateAir_KhartoumResidential | 70
 2023-03-27 | AirNow                               | 72
 2023-03-26 | AirNow                               | 72
 2023-03-25 | AirNow                               | 72
 2023-03-24 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-23 | AirNow                               | 70
 2023-03-22 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-21 | AirNow                               | 72
 2023-03-20 | AirNow                               | 48
 2023-03-19 | AirNow, StateAir_KhartoumResidential | 24
 2023-03-18 | AirNow, StateAir_KhartoumResidential | 37
 2023-03-17 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-16 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-15 | AirNow                               | 72
 2023-03-14 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-13 | AirNow, StateAir_KhartoumResidential | 72

This summary is from production but staging is similar

The solution to part of this is going to be the source/providers updates we have been talking about. The other issue, for example the node being updated every time we are ingesting will require some reworking of how we are comparing nodes on ingest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant