Multiple providers for one source leads to issues with data exports #8

caparker · 2023-04-13T15:21:18Z

Because of the way that we are linking locations (nodes) to providers and then using that information in the data export paths we are seeing situations where a location is flipping back and forth between different 'folders' in our export file structure.

A good example of this can be seen in location 8047. This location is part of the AirNow/StateAir networks and is ingested (historically) under both. Because of the way that data is updated for realtime ingestion we are seeing the provider move back and forth between AirNow and StateAir, so, for a given month, this is what the exports look like

                                               key                                                |    day     |          exported_on          
-------------------------------------------------------------------------------------------------------+------------+-------------------------------
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220101   | 2022-01-01 | 2022-06-06 05:53:14.690365+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220102   | 2022-01-02 | 2022-06-06 05:56:20.485503+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220103   | 2022-01-03 | 2022-06-06 05:57:45.054661+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220104   | 2022-01-04 | 2022-06-06 05:58:09.662885+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220105   | 2022-01-05 | 2022-06-06 06:01:43.863835+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220106   | 2022-01-06 | 2022-06-06 06:04:19.938678+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220107   | 2022-01-07 | 2022-06-06 06:01:54.443595+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220108   | 2022-01-08 | 2022-06-06 06:05:58.245287+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220109   | 2022-01-09 | 2022-06-06 06:08:31.571867+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220110   | 2022-01-10 | 2022-06-06 06:12:29.719793+00
 records/csv.gz/provider=stateair/country=sd/locationid=8047/year=2022/month=01/location-8047-20220111 | 2022-01-11 | 2022-06-06 06:16:20.613882+00
 records/csv.gz/provider=stateair/country=sd/locationid=8047/year=2022/month=01/location-8047-20220112 | 2022-01-12 | 2022-06-06 06:16:39.314988+00
 records/csv.gz/provider=stateair/country=sd/locationid=8047/year=2022/month=01/location-8047-20220113 | 2022-01-13 | 2022-06-06 06:17:30.131499+00
 records/csv.gz/provider=stateair/country=sd/locationid=8047/year=2022/month=01/location-8047-20220114 | 2022-01-14 | 2022-06-06 06:19:12.926728+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220115   | 2022-01-15 | 2022-06-06 06:21:51.057098+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220116   | 2022-01-16 | 2022-06-06 06:24:40.222024+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220117   | 2022-01-17 | 2022-06-06 06:24:01.474895+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220118   | 2022-01-18 | 2022-06-06 06:26:04.872786+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220119   | 2022-01-19 | 2022-06-06 06:29:09.344685+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220120   | 2022-01-20 | 2022-06-06 06:27:48.974003+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220121   | 2022-01-21 | 2022-06-06 06:30:50.746665+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220122   | 2022-01-22 | 2022-06-06 06:33:38.278047+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220123   | 2022-01-23 | 2022-06-06 06:32:24.575535+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220124   | 2022-01-24 | 2022-06-06 06:35:35.183232+00
 records/csv.gz/provider=stateair/country=sd/locationid=8047/year=2022/month=01/location-8047-20220125 | 2022-01-25 | 2022-05-26 23:53:19.242192+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220126   | 2022-01-26 | 2022-06-06 06:40:01.519439+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220127   | 2022-01-27 | 2022-06-06 06:39:21.093998+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220128   | 2022-01-28 | 2022-06-06 06:41:35.865619+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220129   | 2022-01-29 | 2022-06-06 06:44:53.979586+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220130   | 2022-01-30 | 2022-06-06 06:43:13.894873+00
 records/csv.gz/provider=airnow/country=su/locationid=8047/year=2022/month=01/location-8047-20220131   | 2022-01-31 | 2022-06-06 06:46:20.233677+00

We can see this in the sensor_nodes_history table. Here is a summary of the last 30 days which shows how many times the node was updated and which source_name it was updated to. A day with two sources is a day that it switched at least once

  created   |               sources                | n  
------------+--------------------------------------+----
 2023-04-11 | AirNow                               | 48
 2023-04-10 | AirNow, StateAir_KhartoumResidential | 71
 2023-04-09 | AirNow, StateAir_KhartoumResidential | 71
 2023-04-08 | AirNow, StateAir_KhartoumResidential | 70
 2023-04-07 | AirNow, StateAir_KhartoumResidential | 69
 2023-04-06 | AirNow, StateAir_KhartoumResidential | 46
 2023-04-05 | AirNow, StateAir_KhartoumResidential | 66
 2023-04-04 | AirNow, StateAir_KhartoumResidential | 74
 2023-04-03 | AirNow, StateAir_KhartoumResidential | 72
 2023-04-02 | AirNow                               | 72
 2023-04-01 | AirNow                               | 72
 2023-03-31 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-30 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-29 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-28 | AirNow, StateAir_KhartoumResidential | 70
 2023-03-27 | AirNow                               | 72
 2023-03-26 | AirNow                               | 72
 2023-03-25 | AirNow                               | 72
 2023-03-24 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-23 | AirNow                               | 70
 2023-03-22 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-21 | AirNow                               | 72
 2023-03-20 | AirNow                               | 48
 2023-03-19 | AirNow, StateAir_KhartoumResidential | 24
 2023-03-18 | AirNow, StateAir_KhartoumResidential | 37
 2023-03-17 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-16 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-15 | AirNow                               | 72
 2023-03-14 | AirNow, StateAir_KhartoumResidential | 72
 2023-03-13 | AirNow, StateAir_KhartoumResidential | 72

This summary is from production but staging is similar

The solution to part of this is going to be the source/providers updates we have been talking about. The other issue, for example the node being updated every time we are ingesting will require some reworking of how we are comparing nodes on ingest.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple providers for one source leads to issues with data exports #8

Multiple providers for one source leads to issues with data exports #8

caparker commented Apr 13, 2023

Multiple providers for one source leads to issues with data exports #8

Multiple providers for one source leads to issues with data exports #8

Comments

caparker commented Apr 13, 2023