Skip to content

This repository holds data, scrapers, and analyses for a reporting assignment.

License

Notifications You must be signed in to change notification settings

cmgsalazar/mott-haven-apartments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mott Haven (Port Morris) apartments

This repository contains data scraped and analyzed from various sources in relation to luxury apartments and new developments in the Port Morris / Mott Haven neighborhood in the South Bronx.

Some data scraping and PDF extraction work were done in partial fulfillment of the requirements for the Practical Python for News Investigations class under Sandeep Junnarkar at the Craig Newmark Graduate School of Journalism at CUNY. Notebooks and csv files labeled regex utilized the said library instead of re as part of the class requirements.

The resulting data analyses will be used for a reporting assignment for the Mott Haven Herald.

Each branch in the repository corresponds to a data source.

Data sources

Content

mns-rental market

The monthly-reports and yearend-reports folders hold PDF files downloaded, while the datasets folder holds csv files created by extracting text from the PDF files.

For files that use a different format or variables that returned None, I decided to manually fetch the data points I need and add them to the datasets.

streateasy-buildings

From the main list of buildings scraped from StreetEasy, I filtered data to show only buildings built in 2019 onward and within a particular area (with lat, lng coordinates).

Process

I mostly used BeautifulSoup to scrape webpages and wget and os to download files.

There are two scrapers for the StreetEasy webpages: one utilized async Playwright — which I then fed to BeautifulSoup for parsing (H/T to Jonathan Soma for this advice!) — and the other used BeautifulSoup and headers.

To extract text from PDFs, I used pymupdf4llm. There are two versions for the MNS reports: one used re and the other used regex for regular expressions searches.

About

This repository holds data, scrapers, and analyses for a reporting assignment.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published