Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor scraper to handle exceptions #1

Open
albertlyu opened this issue Mar 23, 2014 · 0 comments
Open

Refactor scraper to handle exceptions #1

albertlyu opened this issue Mar 23, 2014 · 0 comments
Labels

Comments

@albertlyu
Copy link
Owner

Currently, the scraper works as intended for 13 out of the 30 team pages. 8 of the remaining 17 team pages pull data successfully, but switch the employee and title because of switched HTML tags. 2 pages have incomplete outputs and missing employees, 1 page returns an IndexError, 1 page returns a UnicodeEncodeError, and the remaining 5 team pages have significantly different HTML structures that will require separate scraping code.

Handle all of these exceptions such that all 30 team pages can be scraped into a single csv file with the same dimensions of information (team, department, subdepartment, employee, title).

See https://github.com/albertlyu/mlb-front-offices/blob/master/mlbfrontoffice_scraper.py#L23-L48.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant