Added a script to automatically pull 2010 population data and add to spreadsheet #6

minorsecond · 2016-09-25T10:52:06Z

This will be useful for those who want to use the shooting data for research as rate calculation will be simplified.

A new spreadsheet will be created containing the following fields:

total population
white population
black population
asian population
hispanic population

Simply run the script by entering: python get_population.py, and it will iterate through the rows and query the US Census Bureau for the data. The Census API is slow, and because there are over 1,000 various towns, processing the entire spreadsheet will take some time.

Upon completion the script will report which towns it had trouble finding data for, so that the user can manually search if desired.

This was done in reference to issue #5 .

…ntaining population data.

fine101w · 2016-10-05T22:19:51Z

OK...first I'm new to Git; second I know nothing about Python; just an end user of applied statistical applications. Have been working with the original spreadsheet and U.S. Census records (where I merged in some population data as well as %'s for white, black and Hispanic pop for the town/city or place referenced in the original file.
The issue I'm having is less about programming and more about goal/intent. When reviewing the 'places' named one can see that some are towns, others are cities, there are also unincorporated areas...and even a few counties. In addition, some of the places are actually neighborhoods within large urban areas, e.g. Los Angeles and Knoxville. I think these different levels of geography undercut efforts to generate rates. There's a denominator problem.
My interest in % racial/ethnic composition involves using those data for a multi-level modeling analysis (the distribution of fatalities within the US is not random so comparisons in the fatalities data set to population totals is flawed. We've done this sort of thing with some other health outcomes.)
Now, my wish list actually would be to get some ZIP code at the center/centroid or almost anywhere within each geoplace. And then use those ZIPs to merge in urban-rural status (urban/large town/small town/remote). The population #'s will not work because some of those neighborhoods/areas listed are small--but are really part of urban clusters. The ZIP codes could be used to merge in an old but excellent national database (RUCA or rural urban commuting area codes) that has been used alot...but nevermind.

minorsecond · 2016-10-05T22:39:22Z

@fine101w The discrepancies in the places are certainly an issue. I think using county-level population would probably be best since it would capture those smaller-level geographies. If you're familiar with GIS, it shouldn't be too difficult to take the FIPS and join it to a census shapefile, and then do a spatial join so that you have the number of cases per race, per county.

Another problem with the script and with the dataset are the place names. Because the place names in the CSV aren't always an exact match to what the Census uses, there are quite a few dropped cases that have to be manually checked.

fine101w · 2016-10-05T23:42:34Z

Thanks! appreciate the suggestions. Frankly I have not gone the FIPS joined to shapefile route. will look into it.

Would love more granularity than county. And also wonder (as it appears others have) about any additional variables that WaPo has not incorporated.

As you note, it's a bit of a problematic data set, including the misnamed places (besides typos, found a ghost town in Texas when hunting for, I think, Fuqua).

But, it could be somewhat interesting once populated with a few of these additional race/ethnicity measures.

Other issues abound, e.g. 'armed'...many values make sense; a few, not so much. Have tried a recoded version...but critical dichotomy will be unarmed (yes/no). Problem is some of the 'armed' are rather weakly the case or strange (stapler?). Here's current categorization (the 'other' was Taser i think...will probably place those recs with firearm)

[cid:c0437bcd-e6ab-46ed-b648-1bb496571497]

thanks again,

df

From: Robert Ross Wardrup [email protected]
Sent: Wednesday, October 5, 2016 3:39 PM
To: washingtonpost/data-police-shootings
Cc: fine101w; Mention
Subject: Re: [washingtonpost/data-police-shootings] Added a script to automatically pull 2010 population data and add to spreadsheet (#6)

@fine101whttps://github.com/fine101w The discrepancies in the places is certainly an issue. I think using county-level population would probably be best since it would capture those smaller-level geographies. If you're familiar with GIS, it shouldn't be too difficult to take the FIPS and join it to a census shapefile, and then do a spatial join so that you have the number of cases per race, per county.

Another problem with the script and with the dataset are the place names. Because the place names in the CSV aren't always an exact match to what the Census uses, there are quite a few dropped cases that have to be manually checked.

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//pull/6#issuecomment-251820212, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVjRolCQkVxgyKJz762yISfSFAYj2rzBks5qxCcdgaJpZM4KF37s.

minorsecond added 8 commits September 24, 2016 13:20

Added add'l files

550c7bc

Added code to match city in shootings data to city in places.csv

eceea66

Code now pulls from Census and builds a dictionary of dictionaries co…

9a9aa6d

…ntaining population data.

Working code

a0cd9be

Added more text output to report status.

8dca89b

Added placeholder for API key

cd91e62

Update

e954555

Added .gitignore

9ebe561

jmuyskens closed this Aug 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a script to automatically pull 2010 population data and add to spreadsheet #6

Added a script to automatically pull 2010 population data and add to spreadsheet #6

minorsecond commented Sep 25, 2016

fine101w commented Oct 5, 2016

minorsecond commented Oct 5, 2016 •

edited

Loading

fine101w commented Oct 5, 2016

Added a script to automatically pull 2010 population data and add to spreadsheet #6

Added a script to automatically pull 2010 population data and add to spreadsheet #6

Conversation

minorsecond commented Sep 25, 2016

fine101w commented Oct 5, 2016

minorsecond commented Oct 5, 2016 • edited Loading

fine101w commented Oct 5, 2016

minorsecond commented Oct 5, 2016 •

edited

Loading