Refactor scraper to handle exceptions #1

albertlyu · 2014-03-23T14:43:11Z

Currently, the scraper works as intended for 13 out of the 30 team pages. 8 of the remaining 17 team pages pull data successfully, but switch the employee and title because of switched HTML tags. 2 pages have incomplete outputs and missing employees, 1 page returns an IndexError, 1 page returns a UnicodeEncodeError, and the remaining 5 team pages have significantly different HTML structures that will require separate scraping code.

Handle all of these exceptions such that all 30 team pages can be scraped into a single csv file with the same dimensions of information (team, department, subdepartment, employee, title).

See https://github.com/albertlyu/mlb-front-offices/blob/master/mlbfrontoffice_scraper.py#L23-L48.

albertlyu added the bug label Mar 23, 2014

albertlyu mentioned this issue Mar 23, 2014

Write 'date added' and 'active' indicator variable to csv #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor scraper to handle exceptions #1

Refactor scraper to handle exceptions #1

albertlyu commented Mar 23, 2014

Refactor scraper to handle exceptions #1

Refactor scraper to handle exceptions #1

Comments

albertlyu commented Mar 23, 2014