Home

Welcome to the hcaptcha-model-factory wiki!

This project is about 🏗 hCAPTCHA binary classification model factory.

If this project is hopeful for you, please leave a ⭐star~!

Introduction and motivation

Image recognazation as a most common captcha category was provided by many captcha service like hCaptcha and reCaptcha. But it's can easyly be solved by deep learning. Collect and label data is the only thing you need to do.

Any image recognazation task can be regarded as a binary classification task for now. You just need to decide to "click" or "not click", "true" or "false".

So, this project is as a pluggable module in hcaptcha-challenger, which can quick iteration and update. When a new challenge comes, just train a simple resnet model for it is enough.

This ResNetMini model is only 295KB for onnx format. But I don't know how big the hCaptcha generation model is, haha!

Make AI great again!

File structure

In progressing...

Model

ResNetMini
- size: 295 KB
- params: TBD
- structure: TBD

Usage

Library: Python 3.7+, PyTorch>=1.8.1 [Optional: CUDA>=10.2]

System: Windows/Linux/Mac

(It supports all system which can install PyTorch, but I just test it on Windows. Hope you know, and Welcome a pr!)

Preparing

Run following command.

git clone https://github.com/beiyuouo/hcaptcha-model-factory.git
cd hcaptcha-model-factory
pip install -r requirements.txt
cd src

Configuration

When a new task comes, you need to modify the task_name varible in config.py. You may need to tune the parameters in training setting section.

Label data

I think you do not need a label tool for this task... Just drag to the corresponding label folder is enough. It's easy, right?

Split data

Place your labeled data in data\[task]\origin\[yes|bad]. It will be divied automatically.

python main.py --split

Train

python main.py --mode train

Val

python main.py --mode val

Test

In progressing...

Full workflow

python main.py --split --mode trainval

Copyright @BJ.YAN

Wiki