Web Scraper

This scraper accepts the number of pages and type of articles from the user, scrapes nature.com (once), and saves all articles of the selected type to individual text files organized in directories by page number.

Functions from previous stages which can be called by editing the main() function include:

get_content()
- Accept a URL and return its content in JSON format.
get_movie_title_and_description()
- Accept a URL of a movie on IMDB.com and return its title and description.
save_page_source_code()
- Accept a URL, save the source code to a file and return a status message.

This app was built as a JetBrains Academy project, and the repository also contains my code snippets from exercises from JetBrains Academy's Python Developer track in the "Problems" directory.

External modules used

BeautifulSoup
requests

How to use

Clone the repository

git clone [email protected]:valenciarichards/web-scraper.git

Requirements

To install all necessary modules, navigate to the root directory of the repository and run:

pip install -r requirements.txt

Usage

Navigate to "Web Scraper/task" and run:

python scraper.py

Then, enter the number of pages and type of articles to scrape when prompted. The articles will be saved as individual ".txt" files organized in directories by page number.

License

The source code is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Problems		Problems
Web Scraper		Web Scraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
course-info.yaml		course-info.yaml
course-remote-info.yaml		course-remote-info.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraper

External modules used

How to use

Clone the repository

Requirements

Usage

License

About

Contributors 2

Languages

License

valenciarichards/web-scraper

Folders and files

Latest commit

History

Repository files navigation

Web Scraper

External modules used

How to use

Clone the repository

Requirements

Usage

License

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages