Skip to content

Latest commit

 

History

History
23 lines (15 loc) · 731 Bytes

README.md

File metadata and controls

23 lines (15 loc) · 731 Bytes

Scraping

About this project

A scraper for one of Sweden's premier legal to scrape PDF URLs and metadata from this website https://www.domstol.se/hogsta-domstolen/avgoranden/.

Prerequisites

  • Python 3.11
  • ChromeDriver for selenium
  • Required Python packages (requirements.txt)

Installation

  • Create a .env file with the structure of the .env.template
  • Run make venv to install python packages
  • Run python run_dev.py to run the script

Usage

  • The scraper navigates to the specified URL, applies filters, and extracts PDF URLs from the page source.
  • It then fetches metadata for each PDF and saves it to metadata.json and metadata.csv.
  • Latest 10 PDFs are downloaded to the specified directory.