Skip to content

Toy data science project using baseball data to predict a pitchers next pitch

Notifications You must be signed in to change notification settings

benwolbransky/Baseball_Toy_Project

Repository files navigation

MLB Pitch Prediction Project

Introduction

This repository contains code and analysis for predicting a pitcher's next pitch using pitch-by-pitch data from Major League Baseball (MLB). The project aims to develop a predictive model and provide insights into a pitcher's behavior.

Project Structure

The project is structured as follows:

  • sandbox_and_EDA notebook: Initial exploratory data analysis (EDA) to understand the dataset's feature distribution and familiarize with the data.

  • pitch_data.py: Module containing a function to load the dataset into other notebooks. The module includes docstrings and exception handling to demonstrate data engineering skills.

  • dummy_results.ipynb: Development of a dummy classifier to establish baseline predictions and set performance goals. Results and code for the dummy classifier can be found in this notebook.

  • models_and_features.ipynb: Implementation of the predictive model and assessment of evaluation metrics. The notebook presents a model with a 0.48 accuracy and outlines next steps for improvement.

Usage

To replicate the analysis or explore the project, follow these steps:

  1. Open the Jupyter notebooks in the respective order mentioned above.

  2. Execute the code cells to reproduce the analysis and view the results.

Next Steps

For further development and enhancement of the project, consider the following next steps outlined in the models_and_features.ipynb notebook.

Acknowledgments

Thank you for time in reviewing this project.

About

Toy data science project using baseball data to predict a pitchers next pitch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published