-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a script to automatically pull 2010 population data and add to spreadsheet #6
Conversation
…ntaining population data.
OK...first I'm new to Git; second I know nothing about Python; just an end user of applied statistical applications. Have been working with the original spreadsheet and U.S. Census records (where I merged in some population data as well as %'s for white, black and Hispanic pop for the town/city or place referenced in the original file. |
@fine101w The discrepancies in the places are certainly an issue. I think using county-level population would probably be best since it would capture those smaller-level geographies. If you're familiar with GIS, it shouldn't be too difficult to take the FIPS and join it to a census shapefile, and then do a spatial join so that you have the number of cases per race, per county. Another problem with the script and with the dataset are the place names. Because the place names in the CSV aren't always an exact match to what the Census uses, there are quite a few dropped cases that have to be manually checked. |
Thanks! appreciate the suggestions. Frankly I have not gone the FIPS joined to shapefile route. will look into it. Would love more granularity than county. And also wonder (as it appears others have) about any additional variables that WaPo has not incorporated. As you note, it's a bit of a problematic data set, including the misnamed places (besides typos, found a ghost town in Texas when hunting for, I think, Fuqua). But, it could be somewhat interesting once populated with a few of these additional race/ethnicity measures. Other issues abound, e.g. 'armed'...many values make sense; a few, not so much. Have tried a recoded version...but critical dichotomy will be unarmed (yes/no). Problem is some of the 'armed' are rather weakly the case or strange (stapler?). Here's current categorization (the 'other' was Taser i think...will probably place those recs with firearm) [cid:c0437bcd-e6ab-46ed-b648-1bb496571497] thanks again, df From: Robert Ross Wardrup [email protected] @fine101whttps://github.com/fine101w The discrepancies in the places is certainly an issue. I think using county-level population would probably be best since it would capture those smaller-level geographies. If you're familiar with GIS, it shouldn't be too difficult to take the FIPS and join it to a census shapefile, and then do a spatial join so that you have the number of cases per race, per county. Another problem with the script and with the dataset are the place names. Because the place names in the CSV aren't always an exact match to what the Census uses, there are quite a few dropped cases that have to be manually checked. You are receiving this because you were mentioned. |
This will be useful for those who want to use the shooting data for research as rate calculation will be simplified.
A new spreadsheet will be created containing the following fields:
Simply run the script by entering: python get_population.py, and it will iterate through the rows and query the US Census Bureau for the data. The Census API is slow, and because there are over 1,000 various towns, processing the entire spreadsheet will take some time.
Upon completion the script will report which towns it had trouble finding data for, so that the user can manually search if desired.
This was done in reference to issue #5 .