This is a repository with accumulated public data from "Deutsche Bahn", the biggest german train company. The data is fetched and saved from the public DB API 4 times a day using github actions. The data is used to create a website with statistics about train delays and canceled trains.
The timetables-api is used to collect the raw data. It's free to query the api up to 60 times per seconde and the data is licensed as (CC BY 4.0).
The timetable-plan-api is used to get the planned timetable for a station at a specific hour and day. The timetable-changes-api is used to get all change. This API is queried every 6 hours to not miss any changes.
The API is queried using the evas from the biggest train stations To get the evas, the Station Data API is used to get all station with category 1 or 2. The responses are saved in the data
folder. Each day is a new subfolder and the suffix of each file hour in UTS time when the change request was made or the time of the planned train schedule.
You can look at the api using the website https://editor.swagger.io/ together with OpenAPI Documentation you can download from here.
An example curl command to query the plan api:
curl -s -H "DB-Api-Key: $API_KEY" -H "DB-Client-Id: $CLIENT_ID" -H "accept: application/xml" "https://apis.deutschebahn.com/db-api-marketplace/apis/timetables/v1/plan/08000260/$(date +"%y%m%d")/$(date +"%H")"
The database contains the following columns that track train schedules, delays, and changes:
station
: String - The name of the station where the train stop occurstrain_name
: String - The identifier of the train, combining train type and number (e.g., "IC 123" or "RE 10")train_type
: String - The type of train service (e.g., "IC", "ICE", "EC", "RE")final_destination_station
: String - The final destination station of the train's journey
delay_in_min
: Integer - The delay in minutes (calculated from departure delay if available, otherwise arrival delay)time
: Timestamp - The effective time of the train stop (uses departure time if available, otherwise arrival time)arrival_planned_time
: Timestamp - The scheduled arrival time at the stationarrival_change_time
: Timestamp - The actual/modified arrival time. If no changes occurred, equals the planned timedeparture_planned_time
: Timestamp - The scheduled departure time from the stationdeparture_change_time
: Timestamp - The actual/modified departure time. If no changes occurred, equals the planned time
is_canceled
: Boolean - Indicates whether the train stop was canceled (true) or not (false)
train_line_ride_id
: String - Unique identifier for a specific train journeytrain_line_station_num
: Integer - The sequential number of this station stop within the train's journey
-
Clone the repository:
git clone https://github.com/piebro/deutsche-bahn-data.git cd deutsche-bahn-data
-
Create and activate a virtual environment:
python3 -m venv .venv source .venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Create a monthly data release:
python create_monthly_data_release.py "YYYY-MM"
Replace
YYYY-MM
with the desired year and month.
Contributions are welcome. Open an Issue if you want to report a bug, have an idea or want to propose a change.
All code in this project is licensed under the MIT License. The data is licensed under Attribution 4.0 International (CC BY 4.0) by Deutsche Bahn.