IMDB rating classifier

Overview

The application scrapes data from IMDB and adjusts the rating system according to some specific validation rules (review penalization).

The data is scraped from the IMDB charts API using the BeautifulSoup library.

The data structure of the parsed and normalized payload is as follows (example):

{
  "rank": "1",
  "title": "The Shawshank Redemption",
  "year": "1994",
  "rating": "9.2",
  "votes": "2,223,000",
  "url": "/title/tt0111161/",
  "oscars_won": 0,
  "penalized": false
}

We would then, extract the following fields, into a dataframe:

- rank (int)
- title (str)
- year (int)
- rating (float)
- votes (int)
- url (str)
- oscars_won (int)
- penalized (bool)

Using dataclasses, we can then, preprocess the data against some schema definition.

The rules are as follows:

schema = {
    "rank": {
        "type": "int",
        "min": 1,
        "max": 250,
        "required": True,
    },
    "title": {
        "type": "str",
        "required": True,
    },
    "year": {
        "type": "int",
        "min": 1900,
        "max": 2023,
        "required": True,
    },
    "rating": {
        "type": "float",
        "min": 0.0,
        "max": 10.0,
        "required": True,
    },
    "votes": {
        "type": "int",
        "min": 0,
        "required": True,
    },
    "url": {
        "type": "str",
        "required": True,
    },
    "oscars_won": {
        "type": "int",
        "min": 0,
        "required": True,
    },
    "penalized": {
        "type": "bool",
        "required": True,
    },
}

Requirements

Python>=3.8>=3.10
BeautifulSoup4
requests
pytest
tox
click
pre-commit
flake8
black
isort

and more...

Installation

For development purposes:

Clone the repository

foo@bar:~$ git clone [email protected]/marouenes/imdb-rating-classifier.git

Create a virtual environment

foo@bar:~/imdb-rating-classifier$ virtualenv .venv

Activate the virtual environment

foo@bar:~/imdb-rating-classifier$ source .venv/bin/activate

Install the dev dependencies

foo@bar:~/imdb-rating-classifier$ pip install -r requirements-dev.txt

Install the pre-commit hooks

foo@bar:~/imdb-rating-classifier$ pre-commit install

For usage:

Install the dependencies and build the wheel

foo@bar:~/imdb-rating-classifier$ pip install -e .

The application is publicly available and published on PyPI and can be installed using pip:

foo@bar:~$ pip install imdb-rating-classifier

Usage

Display the help message and the available commands

foo@bar:~$ imdb-rating-classifier generate --help
Usage: imdb-rating-classifier generate [OPTIONS]

  Generate the output dataset containing both the original and adjusted
  ratings.

  An extra JSON file will be generated alongside the csv file

Options:
  --output FILE               The path to the output file.
  --number-of-movies INTEGER  The number of movies to scrape.
  -h, --help                  Show this message and exit.

Run the application with the default number of movies (20) and the default output file (data.csv)

imdb-rating-classifier generate

Run the application with a specific number of movies

imdb-rating-classifier generate --number-of-movies 100

Run the application with a specific number of movies and a specific output file

imdb-rating-classifier generate --number-of-movies 100 --output some_name.csv

Testing

Run tests and pre-commit hooks

foo@bar:~/imdb-rating-classifier$ tox

CI/CD

The application is automatically packaged and distributed to PyPI, It is also automatically tested using tox as an environment orchestrator and GitHub Actions.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
data		data
docs		docs
imdb_rating_classifier		imdb_rating_classifier
testing		testing
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMDB rating classifier

Table of Contents

Overview

Requirements

Installation

Usage

Testing

CI/CD

TODO

License

Author

About

Releases 6

Packages

Languages

License

marouenes/imdb-rating-classifier

Folders and files

Latest commit

History

Repository files navigation

IMDB rating classifier

Table of Contents

Overview

Requirements

Installation

Usage

Testing

CI/CD

TODO

License

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages