Scraping Business Details with multi processing concept from https://www.yellowpages.com where the (Keyword, place, Count) using Python and LXML to CSV file.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Extracting possible fields from the search page and specific business card details in search page:
From search page | From business page |
---|---|
ID | |
Business name | Years in business |
Phone | General info |
page(href) | Category |
Address | Neighborhoods |
Website | Services |
Rating |
This script built using Python 3 and:
- requests -- For calling Yellow Pages URLs
- lxml -- To convert the HTML to string
- unicodecsv -- Export the data to CSV file
- argparse -- Handling arguments passes to script
- math -- Calculate to get page number
- urllib3 -- Remove https error
- multiprocessing -- To use multi process to finish the script faster
- time -- Calculate to getting time spent to finish
You Need to run the script name followed by the positional arguments keyword and place and count, the script working well with small/capital cases 👍 count argument is count of business cards in the search page example used to looping on all business cards related the keyword and place Here is an example to find the business details for a digital agency in Los Angeles, CA.
python yellow_pages.py digital+agency Los+Angeles,+CA 64
This will create a CSV file: Sample output
Code copyright 2019 MIT License