News-WebScraper

DevOps Task

This task checks for the following skills:

Databases

Sites scraping

The basics of DevOps

Scraping

Create a script that parses the site and collects news

Url: http://feeds.bbci.co.uk/news/world/us_and_canada/rss.xml

This script should be able:

get all urls, titles, short description, datetime

Additional task:

use collected urls to get body of news

Database

Save all collected data in PostgresSQL database

Additional task:

Save collected data in MongoDB

The basics of DevOps

Chose one database, Mongo or Postgres to finish this part.

Create docker-compose.yml with all necessary services

Create python or shell script(or combination of them) as entry point for user

User should be able to:

Start database server

Create schema

Run scrapper

Retrieve data for a specific date (output-> csv file)

Additional task:

Create service in docker-compose with cron service inside. Cron task should start Scraper hourly and add only new news in database.

Also you should create instructions.md so user could be able to install and run everything that you created

It is normal if you don't know some of the things but get a tutorial/book and learn it on the flight.

Upload your project to GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
instructions.md		instructions.md
requirements.txt		requirements.txt
resources.txt		resources.txt
run.sh		run.sh
tun.sh		tun.sh
vun.sh		vun.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News-WebScraper

About

Releases

Packages

Languages

HalimHamidov/News-WebScraper

Folders and files

Latest commit

History

Repository files navigation

News-WebScraper

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages