Lesson_4: Data Visualization

First we'll practice some data visualization!

Let's walk through a few examples:

We'll be using some new packages, make sure you have them

import pandas as pd import os import matplotlib as plt import numpy as np
Read in some NCAA data. Use head() and describe() to see what your data looks like

df = pd.read_csv('../input/2013_NCAA_Game.csv')
A new way to look at your data is using a scatter matrix, try out the function with some different options

pd.scatter_matrix(df)
Now try using a histogram to view a single series

hist(df['Team Avg Scoring Margin'])
Matplotlib contains several different plotting options, try some out:

plt.scatter(df['Team Score'], df['Team Margin'])

Classwork

Last class we practiced using sql to join. Now lets try it using pandas! There is a file in the input folder with player data. Try loading the data and joining it to our team data, df.

pf = pd.read_csv('../input/clean_player_data.csv')

This is data I've partially cleaned, if you'd like more practice try cleaning the original file:

pf = pd.read_csv('../input/player_data.csv')

The goal is to add extra data to our original df, once you've added some columns use our new visualization techniques to see what new data you have.

Once you've done this, submit your new df by storing it in your output folder

Extra:

You'll need more data for our upcoming NCAA project, try getting a new data source and integrating it into our current data!

or

Try out some of the other libraries listed below, personally I would start with Vincent!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lesson_4: Data Visualization

First we'll practice some data visualization!

Classwork

Once you've done this, submit your new df by storing it in your output folder

Extra:

Some Resources:

Clone this wiki locally