This web scrape utilizes the BeautifulSoup and Selenium Webdriver libraries to fetch the following data from a Costco product page and load it into a CSV file:
- SEO Meta Tags
- Product Name
- Product Description
- Product Specifications
- Category
- Price
- Embedded images
This script ONLY works for the Costco website. It will break for any other website.
These instructions will get you a copy of the project up and running on your local machine for testing purposes.
-
Python 3.7.0, make sure in the installation directions to click "Default Path", and click the check button to install PIP as well
Once Python 3.7 is installed:
-
Webdriver. Please install the Chrome version!
-
Selenium:
pip install selenium
-
BeautifulSoup:
pip install beautifulsoup4
-
-
In the DriverPath.txt file, paste the path of the webdriver you installed above
C:\Users\DAE\Downloads\Chromedriver
-
If you installed a driver other than Chrome, open Scrape.py and do the following:
On line 27, by default there is
driver = webdriver.Chrome(path_to_driver)
- For Firefox:
driver = webdriver.Firefox(path_to_driver)
- For Safari:
driver = webdriver.Safari(path_to_driver)
- For Firefox:
For every iteration of scraping:
-
In the URLS.txt file, delete all the current urls there
-
Paste 10 new links, each on its own line, without quotation lines
-
On the command line, go to the directory of the github repository by running:
cd /d C:\Users\DAE\Documents\CostcoScrape\costco-scrape-master
-
On the command line, start the script by running:
python scrape.py
-
That should run without any errors! In case there are, there could be something wrong with steps 2-3.
-
Open the OutputData.csv file and voila, all the data from the above 10 links is loaded!
-
Congratulations!
- CHUDDY