Skip to content

Kedro project with pipelines to fit lstm password-checker

Notifications You must be signed in to change notification settings

pacifikus/pass-complexity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pass complexity

Tests Code review wemake-python-styleguide

Overview

This is a project to predict password complexity. Target is a real value indicating how many times the password could be encountered in one million random passwords.

Data

Data were taken from the Kaggle competition DMIA.ProductionML 2021.1 Password complexity

Metrics

Main metric is RMSLE. RMSLE is preferable when

  • targets having exponential growth, such as population counts, average sales of a commodity over a span of years etc
  • we care about percentage errors rather than the absolute value of errors.
  • there is a wide range in the target variables and we don’t want to penalize big differences when both the predicted and the actual are big numbers.
  • we want to penalize under estimates more than over estimates.

You can find more information here

Experiments setup

  • Hardware
    • CPU count: 1
    • GPU count: 1
    • GPU type: Tesla T4
  • Software:
    • Python version: 3.7.14
    • OS: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic
Model RMSLE
Vanilla Linear Regression 0.5023
Random Forest Regressor 0.5018
RF Regressor with hyperparams tuning 0.5000
TF-IDF + Linear Regression 0.4831
Custom LSTM 0.3428

Recommended production hardware requirements

  • Hardware
    • CPU: 4 CPU Cores
    • GPU: single GPU with at least 4 GB GPU RAM (btw, you can use only CPU model inference. See also CPU inference optimization)
    • RAM: 8 GB
    • System disk space: 2 GB

How to run

Install dependencies

First of all, install project dependencies, with command:

pip install -r src/requirements.txt

Pipelines

The project includes 3 pipelines:

  • data processing pipeline with etl function
  • data science pipelines with a train-test split and a model fitting
  • inference pipeline with a prediction for the test data

Current pipeline DAG looks like:

Pipeline DAG

How to run pipelines

You can run pipelines from the project with:

kedro run

How to run app

You can run served model with Flask and Waitress WSGI server

To run the application from existing docker image run

docker run -p 5001:5001 pacificus/dmia_pass_complexity

and go to http://localhost:5001/predict?password={YOUR INPUT}

To create your own docker image with some modifications run from the project root

docker build -t dmia_pass_complexity src/pass_complexity/api

Testing

Unit testing

You can run nodes unit tests with:

kedro test

Load testing

There is load test written with Locust in the src/tests/load/locustfile.py. To run test follow these steps:

  • install locust with pip install locust
  • go to load tests folder
  • run locust web UI with command locust
  • open http://localhost:8089/ and specify test params (Number of users, Spawn rate, Host with running search server)
  • start swarming

Also, you can run load tests without web UI, see Locust docs

Load testing was performed with such configuration:

  • Hardware
    • CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2.59 GHz
    • RAM: 16 GB
    • System disk space: 20 GB
  • Testing setup
    • Number of users: 300
    • Spawn rate: 1

So, with this hardware server can handle up to 16 RPS.

Load testing charts you can see below

Locust charts

About

Kedro project with pipelines to fit lstm password-checker

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published