KDE-Analysts-Bikeshare Toronto Analysis

Bikeshare Analysis:

This project analyzes raw data collected from Bikeshare Toronto and weather data from the City of Toronto in order to determine the relationships between weather, ridership, and geographical distribution in the years of 2017, 2018, 2019, 2020.

Motivation and Business Objective

The motivation for our project is to determine the weather conditions and geographical location yielding the highest ridership. This information would allow Bikeshare Toronto to expand their program into new GTA neighbourhoods and densifying existing downtown infrastructure in districts with high ridership.

Description of Project Files and Process

The Data Wrangling + Cleaning folder contains the datasets while the project is separated into 3 Jupyter notebooks of "Cleaning and Wrangling", "Exploratory Analysis", and "Modelling".

The "Data Cleaning and Wrangling Final" file combines the 3 types of datasets of ridership, weather data, and bikeshare station into 1 clean dataframe for data Analysis. This process included creating consistent labelling of datapoints to reduce null values when joining datasets, consistent timestamp format, and determining what parameters are relevant to the analysis and removing the ones that are not relevant. The challenge in this section was manually matching the station names that were labelled differently between the ridership dataset and the bikeshare station dataset as this was the primary reason for null values within the dataset.

The "Exploratory Analysis" section, we plotted ridership over time to see the patterns of ridership throughot the years. We also analyzed user behaviour through monthly, weekly, and daily intervals to understand how ridership flucuates. We also analyzed how weather events such as wind speed, humidity, or visibility affected user behaviour. Lastly, we determined the geographical districts with the highest ridership and the average time a user spend per trip. We compared these results before the pandemic lockdown and after to determine the change in ridership.

The final "Modelling" sectino involved creating the datasets for 70-15-15 for training, validation, testing. Only the most significant features such as temperature, humidity, and wind speed were used in the model. We used one-hot encoding to count the trips under the different weather conditions of "rain,fog, haze, thunderstorm, snow, and clear". Finally, we used a Linear Regression Model for training, validation, testing.

Python Libraries

datetime, descartes, folium, geopandas, json, libspatialindex-dev, matplotlib, numpy, os, pandas, requests, rtree, seaborn, sklearn, warnings

Credits

Kevin Yang - https://github.com/D-D-Dean
Dean Bu - https://github.com/kaeyang
Eric Wu - https://github.com/ZenHWu?fbclid=IwAR2vsELk4ycuDNRxZYOIMoTNaXSVB_kDmrv6WylaCS7eO3Y3rpJ47wdSPno

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Data Wranging + Cleaning		Data Wranging + Cleaning
Data_Wrangling_Cleaning.ipynb		Data_Wrangling_Cleaning.ipynb
KDE_Analyst_exploratory_data_analysis.ipynb		KDE_Analyst_exploratory_data_analysis.ipynb
KDE_Analysts_Modelling.ipynb		KDE_Analysts_Modelling.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KDE-Analysts-Bikeshare Toronto Analysis

Bikeshare Analysis:

Motivation and Business Objective

Description of Project Files and Process

Python Libraries

Credits

About

Releases

Packages

Languages

ZenHWu/KDE-Analysts

Folders and files

Latest commit

History

Repository files navigation

KDE-Analysts-Bikeshare Toronto Analysis

Bikeshare Analysis:

Motivation and Business Objective

Description of Project Files and Process

Python Libraries

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages