This is a project to predict password complexity. Target is a real value indicating how many times the password could be encountered in one million random passwords.
Data were taken from the Kaggle competition DMIA.ProductionML 2021.1 Password complexity
Main metric is RMSLE. RMSLE is preferable when
- targets having exponential growth, such as population counts, average sales of a commodity over a span of years etc
- we care about percentage errors rather than the absolute value of errors.
- there is a wide range in the target variables and we don’t want to penalize big differences when both the predicted and the actual are big numbers.
- we want to penalize under estimates more than over estimates.
You can find more information here
- Hardware
- CPU count: 1
- GPU count: 1
- GPU type: Tesla T4
- Software:
- Python version: 3.7.14
- OS: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic
Model | RMSLE |
---|---|
Vanilla Linear Regression | 0.5023 |
Random Forest Regressor | 0.5018 |
RF Regressor with hyperparams tuning | 0.5000 |
TF-IDF + Linear Regression | 0.4831 |
Custom LSTM | 0.3428 |
- Hardware
- CPU: 4 CPU Cores
- GPU: single GPU with at least 4 GB GPU RAM (btw, you can use only CPU model inference. See also CPU inference optimization)
- RAM: 8 GB
- System disk space: 2 GB
First of all, install project dependencies, with command:
pip install -r src/requirements.txt
The project includes 3 pipelines:
- data processing pipeline with etl function
- data science pipelines with a train-test split and a model fitting
- inference pipeline with a prediction for the test data
Current pipeline DAG looks like:
You can run pipelines from the project with:
kedro run
You can run served model with Flask and Waitress WSGI server
To run the application from existing docker image run
docker run -p 5001:5001 pacificus/dmia_pass_complexity
and go to http://localhost:5001/predict?password={YOUR INPUT}
To create your own docker image with some modifications run from the project root
docker build -t dmia_pass_complexity src/pass_complexity/api
You can run nodes unit tests with:
kedro test
There is load test written with Locust in the src/tests/load/locustfile.py
.
To run test follow these steps:
- install locust with
pip install locust
- go to load tests folder
- run locust web UI with command
locust
- open
http://localhost:8089/
and specify test params (Number of users, Spawn rate, Host with running search server) - start swarming
Also, you can run load tests without web UI, see Locust docs
Load testing was performed with such configuration:
- Hardware
- CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2.59 GHz
- RAM: 16 GB
- System disk space: 20 GB
- Testing setup
- Number of users: 300
- Spawn rate: 1
So, with this hardware server can handle up to 16 RPS.
Load testing charts you can see below