-
Notifications
You must be signed in to change notification settings - Fork 8
Lesson_4: Data Visualization
Let's walk through a few examples:
-
We'll be using some new packages, make sure you have them
import pandas as pd import os import matplotlib as plt import numpy as np
-
Read in some NCAA data. Use head() and describe() to see what your data looks like
df = pd.read_csv('../input/2013_NCAA_Game.csv')
-
A new way to look at your data is using a scatter matrix, try out the function with some different options
pd.scatter_matrix(df)
-
Now try using a histogram to view a single series
hist(df['Team Avg Scoring Margin'])
-
Matplotlib contains several different plotting options, try some out:
plt.scatter(df['Team Score'], df['Team Margin'])
Last class we practiced using sql to join. Now lets try it using pandas! There is a file in the input folder with player data. Try loading the data and joining it to our team data, df.
pf = pd.read_csv('../input/clean_player_data.csv')
This is data I've partially cleaned, if you'd like more practice try cleaning the original file:
pf = pd.read_csv('../input/player_data.csv')
The goal is to add extra data to our original df, once you've added some columns use our new visualization techniques to see what new data you have.
You'll need more data for our upcoming NCAA project, try getting a new data source and integrating it into our current data!
or
Try out some of the other libraries listed below, personally I would start with Vincent!