GDAC Capstone

This is the capstone project for the Google Data Analytics Certificate available through Coursera

For my capstone, I chose to complete Track 1 with the Cyclistic bike-share case study.

Background information about the case study can be found in the info folder.

Note About Code: I tried to included the package name the first time I use a function. Like this rio::import.

Question to Answer

How do annual Member and Casual Riders use Cyclistic bikes differently?

You will produce a report with the following deliverables:

A clear statement of the business task
A description of all data sources used
Documentation of any cleaning or manipulation of data
A summary of your analysis
Supporting visualizations and key findings
Your top three recommendations based on your analysis

Business Task

The business task is to compare Casual and Member Riders to help develop a strategy to convert Casual Riders to Member Riders.

Data Source:

This data is made available by Divvy Bikes for use, the name of the company has just been changed. Data is available here.

# import 1 file to see structure
march_2021 <- rio::import(here::here("bike_data", "202103-divvy-tripdata.csv"), 
                          setclass = "tibble")

# create list of all file names
bike_files <- list.files(here("bike_data"), pattern = ".csv", full.names = TRUE)

# import all files and combine into one tibble
bikes_raw <- bike_files %>% 
  lapply(import, setclass = "tibble") %>% 
  dplyr::bind_rows()

rio::export(bikes, file = here("bike_data", "all_bikes.rds"))

Data Wrangling

bikes <- bikes_raw %>% 
  dplyr::mutate(ride_length = as.numeric(difftime(ended_at, started_at, units = "mins")),
                day = lubridate::wday(started_at, label = TRUE)) %>% 
  mutate(member_casual = case_when(member_casual == "casual" ~ "Casual", 
                                   TRUE ~ "Member"))

bikes_sum <- bikes %>% 
  dplyr::group_by(member_casual) %>% 
  dplyr::summarize(number = n(), across(ride_length, mean))

bike_sum2 <- bikes %>% 
  group_by(day, member_casual) %>% 
  summarize(number = n(), across(ride_length, mean))

Comparing Casual and Member Riders

(sum_gt <- bikes_sum %>% 
  gt::gt() %>% 
  gt::cols_label(member_casual = md("Membership Type<br> "),
                 number = md("Number of Rides<br> "),
                 ride_length = md("Average Ride Length<br>(minutes)")) %>% 
  gt::cols_width(everything() ~ px(120)) %>% 
  cols_width(3 ~ px(200)) %>% 
  gt::cols_align(everything(), align = "center") %>% 
  gt::fmt_number(columns = 2, decimals = 0) %>% 
  gt::fmt_number(columns = 3, decimals = 2) %>% 
  gtExtras::gt_theme_guardian()
)

gt::gtsave(sum_gt, here("outputs", "membership_comparison.png"))

While there were more rides by Members in the past year, the average ride length for Casual Riders was 2.4 times longer than that of Member Riders.

(sum_gg <- bike_sum2 %>% 
  ggplot(aes(day, number, fill = member_casual)) + 
  geom_line(aes(group = member_casual), 
            alpha = 0.25, size = 0.5, show.legend = FALSE) + 
  geom_point(aes(size = ride_length), shape = 21, 
             color = "black", show.legend = FALSE) +
  scale_y_continuous(labels = scales::comma_format()) +
  scale_fill_manual(values = c("#E84D45", "#A1BAAC")) + 
  annotate(geom = "text", color = "#A1BAAC", 
           x = 3.5, y = 500000, size = 6,
           label = "Member Riders") + 
  annotate(geom = "text", color = "#E84D45",
           x = 3.5, y = 300000, size = 6,
           label = "Casual Riders") + 
  labs(x = NULL, y = NULL, 
       title = "Number of Riders Per Day of the Week",
       subtitle = "  Larger Circles Indicate Longer Rides") + 
  theme_bw() +
  theme(plot.title.position = "plot", 
        text = element_text(size = 17),
        panel.grid.minor = element_blank())
)

ggsave(sum_gg, filename = here("outputs", "comparison_plot.png"))

Member Riders primarily rode during the week, and had shorter rides. In contrast, Casual Riders primarily rode on the weekends, and the rides were longer.

Recommendations

Based on this analysis, I recommend the following possible implementations:

A shorter time for Casual Riders before they have to pay an additional fee. This could incentivize frequent Casual Riders to upgrade to Member Riders.
A promotion for Casual Riders to have cheaper rates on the weekend.
A different pricing strategy that incentivizes Casual Riders to convert to Member Riders if they do ride during the week.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
info		info
outputs		outputs
.gitignore		.gitignore
GDAC.Rproj		GDAC.Rproj
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GDAC Capstone

Question to Answer

Business Task

Data Source:

Data Wrangling

Comparing Casual and Member Riders

Recommendations

About

Releases

Packages

simscr/GDAC

Folders and files

Latest commit

History

Repository files navigation

GDAC Capstone

Question to Answer

Business Task

Data Source:

Data Wrangling

Comparing Casual and Member Riders

Recommendations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages