Skip to content

Lesson_4: Data Visualization

dhercher edited this page Mar 5, 2014 · 7 revisions

First we'll practice some data visualization!

Let's walk through a few examples:

  1. We'll be using some new packages, make sure you have them

    import pandas as pd import os import matplotlib as plt import numpy as np

  2. Read in some NCAA data. Use head() and describe() to see what your data looks like

    df = pd.read_csv('../input/2013_NCAA_Game.csv')

  3. A new way to look at your data is using a scatter matrix, try out the function with some different options

    pd.scatter_matrix(df)

  4. Now try using a histogram to view a single series

    hist(df['Team Avg Scoring Margin'])

  5. Matplotlib contains several different plotting options, try some out:

    plt.scatter(df['Team Score'], df['Team Margin'])

Classwork

Last class we practiced using sql to join. Now lets try it using pandas! There is a file in the input folder with player data. Try loading the data and joining it to our team data, df.

pf = pd.read_csv('../input/clean_player_data.csv')

This is data I've partially cleaned, if you'd like more practice try cleaning the original file:

pf = pd.read_csv('../input/player_data.csv')

The goal is to add extra data to our original df, once you've added some columns use our new visualization techniques to see what new data you have.

Once you've done this, submit your new df by storing it in your output folder

Extra:

You'll need more data for our upcoming NCAA project, try getting a new data source and integrating it into our current data!

or

Try out some of the other libraries listed below, personally I would start with Vincent!

Some Resources: