Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing data #43

Open
NoamGaash opened this issue Jun 25, 2024 · 3 comments
Open

Missing data #43

NoamGaash opened this issue Jun 25, 2024 · 3 comments

Comments

@NoamGaash
Copy link
Member

When querying some dates, is seems like there are missing SIRI rides:

https://open-bus-stride-api.hasadna.org.il/gtfs_rides_agg/group_by?date_from=2024-06-18&date_to=2024-06-19&group_by=operator_ref,gtfs_route_date&exclude_hour_from=23&exclude_hour_to=2

image

@OriHoch
Copy link
Contributor

OriHoch commented Jun 25, 2024

there are siri rides for those dates, please try to pinpoint the specific problem

$ curl 'https://open-bus-stride-api.hasadna.org.il/siri_rides/list?get_count=true&scheduled_start_time_from=2024-06-18T00%3A00%3A00%2B02%3A00&scheduled_start_time_to=2024-06-19T00%3A00%3A00%2B02%3A00&order_by=id%20asc'
80441

@NoamGaash
Copy link
Member Author

When running:

$ curl https://open-bus-stride-api.hasadna.org.il/gtfs_rides_agg/group_by\?date_from\=2024-06-18\&date_to\=2024-06-19\&group_by\=operator_ref | jq ".[] | .total_actual_rides"

we can see array of zeroes.

Also, when running:

select
    agg.gtfs_route_id,
    agg.gtfs_route_hour,
    agg.num_planned_rides,
    agg.num_actual_rides,
    rt.operator_ref
from gtfs_rides_agg_by_hour agg, gtfs_route rt
where
    agg.gtfs_route_id = rt.id
    and agg.gtfs_route_date >= '2024-06-18 00:00:00+00'
    and agg.gtfs_route_date <= '2024-06-19 00:00:00+00'

the num_actual_rides column is zero.

@OriHoch
Copy link
Contributor

OriHoch commented Jun 25, 2024

thanks, so gtfs_rides_agg_by_hour is a view, it doesn't actually contain the data it's just an interface to a query, so what I need to know is in which table/column the actual data is missing

this is the definition of the gtfs_rides_agg_by_hour view:

SELECT gtfs_ride.gtfs_route_id,
       date_trunc('hour'::text, gtfs_ride.start_time)                     AS gtfs_route_hour,
       date_trunc('day'::text, gtfs_ride.start_time)                      AS gtfs_route_date,
       count(*)                                                           AS num_planned_rides,
       count(*) FILTER (WHERE sr.scheduled_time_gtfs_ride_id IS NOT NULL) AS num_actual_rides
FROM gtfs_ride
         LEFT JOIN siri_ride sr ON gtfs_ride.id = sr.scheduled_time_gtfs_ride_id
GROUP BY gtfs_ride.gtfs_route_id, (date_trunc('hour'::text, gtfs_ride.start_time)),
         (date_trunc('day'::text, gtfs_ride.start_time))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants