Amazon Data Science Books; Analysis & Visualizations

Folder Structure

csv_files contains the processed and un-processed csv files.
notebooks contains the all the .ipynb files. The notebook used to preprocess the data can be found here.
scraper contains the scraper.py file which was used to scrape the data from amazon.

Problem statement

The goal of this project is to gather information of Data Science realted books from amazon. There are total of 1351 entries in the csv_files/amazon_data_science_books.csv file.
Later we utlizied the scraped data to understand the following demographics and correlations using Tableau Dashboard:

A doughnut chart showing the number of books published by the top 15 publishers and the others.
A barchart of top 15 publisher by the amount of books published
Average price of books by the top 15 publishers
Price range of books
Pages vs Price trend
Top books by user reviews (rating 4.0 - 5.0)
Average reviews of Top 15 publishers

Findings and Observations from the Dashboard

Note: Try viewing the Dashboard in Full Screen mode.

Among the 1324 books (after preprocessing the data) 948 of them are published by only 15 publications.
Packt has the highest publication of books
Springer has the highest average price
As the pages increase, the price of the books increases.
Price of the most books fall around the range between (14.00 - 60-00) USD

You can visit the public dashboard here

First look on the dashboard
Also, try clicking the bars on the bar plots, and see the changes.

Build from Sources and run the selenium driver

Clone the repo

git clone https://github.com/Tasfiq-K/amazon-data-science-books-analysis.git

Initiaize and activate virtual environment
If you are running Python 3.4+, you can use the venv module baked into Python:

python -m venv <directory name>

for example, if you name your directory 'venv', then run this command:

python -m venv venv

For activating the virtual environmet run:
On Windows

# In cmd.exe
venv\Scripts\activate.bat
# In Powershell
venv\Scripts\activate.psl

On Linux or MacOs

$ source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Download Webdriver
Download the web driver at your convenience, I've used the geckodriver to use it with the Firefox browser. You can download it from here
Run the scraper

python scraper.py --geckodriver_path <path_to_chromedriver>

You will get a file with the following name amazon_data_science_books.csv containing all the required fields and data. Alternatively, check the scraped data here

Analytics

Tableau Public View: https://public.tableau.com/app/profile/tasfiq.kamran/viz/AmazonDataScienceBooksDashboard/AmazonDataScienceBooks

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
csv_files		csv_files
notebooks		notebooks
scraper		scraper
screenshots		screenshots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Data Science Books; Analysis & Visualizations

Folder Structure

Problem statement

Findings and Observations from the Dashboard

Note: Try viewing the Dashboard in Full Screen mode.

Build from Sources and run the selenium driver

Analytics

About

Releases

Packages

Languages

License

Tasfiq-K/amazon-data-science-books-analysis

Folders and files

Latest commit

History

Repository files navigation

Amazon Data Science Books; Analysis & Visualizations

Folder Structure

Problem statement

Findings and Observations from the Dashboard

Note: Try viewing the Dashboard in Full Screen mode.

Build from Sources and run the selenium driver

Analytics

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages