This repository contains code and analysis for predicting a pitcher's next pitch using pitch-by-pitch data from Major League Baseball (MLB). The project aims to develop a predictive model and provide insights into a pitcher's behavior.
The project is structured as follows:
-
sandbox_and_EDA notebook: Initial exploratory data analysis (EDA) to understand the dataset's feature distribution and familiarize with the data.
-
pitch_data.py: Module containing a function to load the dataset into other notebooks. The module includes docstrings and exception handling to demonstrate data engineering skills.
-
dummy_results.ipynb: Development of a dummy classifier to establish baseline predictions and set performance goals. Results and code for the dummy classifier can be found in this notebook.
-
models_and_features.ipynb: Implementation of the predictive model and assessment of evaluation metrics. The notebook presents a model with a 0.48 accuracy and outlines next steps for improvement.
To replicate the analysis or explore the project, follow these steps:
-
Open the Jupyter notebooks in the respective order mentioned above.
-
Execute the code cells to reproduce the analysis and view the results.
For further development and enhancement of the project, consider the following next steps outlined in the models_and_features.ipynb notebook.
Thank you for time in reviewing this project.