Predicting the Outcome of NHL Games
Sports over time have been largely dominated by the use of numbers and statistics. Moneyball, the 2011 movie demonstrating a statistical approach in developing a competitive team using baseball statistics, shows one approach of how statistics, and machine learning to an extent, can predict and optimize the outcome of sports (in this case it was choosing players for a team).
I will be attempting to see if I can predict the outcome of NHL matches using machine learning tools, particularly the Random Forest Classifier.
The datasets I will be using are from www.hockey-reference.com, part of the www.sports-reference.com sites. Sports-Reference has other excellent data including:
College Football (and NFL)
College Basketball (and NBA)
Olympic Sports
NHL
The huge amount of data available on these sites are an excellent source to practice machine learning as much of the data is cleaned and well organized.
Here are the direct links if you wish to view the two datasets I use to build my model:
Games 2014 data: http://www.hockey-reference.com/leagues/NHL_2014_games.html
Standings 2013 data: http://www.hockey-reference.com/leagues/NHL_2013_standings.html
This project was completed using Jupyter Notebook and Python utilizing Pandas, NumPy and Scikit-Learn.
This project was a great learning experience as this is my first attempt in learning Github. For starters I have simply uploaded the Jupyter/ IPython notebook into this repository in the event you would like to replicate this project to learn as well!