This project uses HuggingFace transformer and Flask for backend to run and deploy a large language model in order to generate log suggestions. The frontend interface is a Visual Studio Code extension using JavaScript.
- Docker (Docker Desktop)
- Python (tested with 3.11/3.12)
-
Open terminal in project root
-
Create python virtual environment
- Create a python environment with
vscode
:F1
>Python : Create Environment...
>venv
- Create a python environment in terminal
python -m venv .venv
- Create a python environment with
-
Activate Virtual Environement
- On Windows:
.venv\Scripts\activate
- On macOS/Linux:
source .venv/bin/activate
- On Windows:
-
Install dependancies and activate pre-commit:
pip install -r requirements-dev.txt; pre-commit install
-
Rename
.env.exemple
to.env
and addHF_TOKEN
in the file -
Start backend:
python app.py
- The server will start on http://localhost:8888
There are several ways to run pre-commit:
- Commit the code
- Run command:
pre-commit run
(Executed on changed files) - Run command:
pre-commit run --all-files
(Force execute on all files)
- Install CUDA on your local Machine : CUDA Toolkit
- Install Torch-Cuda : Start Locally - PyTorch
- Start backend:
python app.py
- The server will start on http://localhost:8888
If a compatible CUDA device is detected, the container will be executed on the GPU.
- Open Docker Desktop
- Create and Run container
docker-compose up --build
- The server will start on http://localhost:8888
-
Install Nvidia Cuda
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update sudo apt-get -y install cuda-toolkit-12-6
-
Install Nvidia Drivers
sudo apt-get install -y nvidia-open sudo apt-get install -y cuda-drivers
-
Config nvidia-docker runtime
sudo apt-get install -y nvidia-docker2 sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker
-
Look at apps running on GPU in this commmand :
nvidia-smi
-
In Docker Desktop for windows, enable Ubuntu WSL integration
Options > Resources > WSL Integration
.
- Open terminal in project root
- install dev dependancies:
pip install -r requirements-dev.txt
- Run tests:
pytest
Once the application is running, you can find the documentation at http://localhost:8888/docs
This endpoint allows the client to ask a question to the model. It accepts a JSON payload with a prompt
key for the user input query. The max_new_tokens
key specifies the maximum number of new tokens the model should generate in response to the input prompt, if not specified, 128 by default.
Endpoint: /predict
Method: POST
Request body:
{
"prompt": "USER_PROMPT",
"max_new_tokens": "MAX_TOKENS"
}
Response body:
{
"content": "MODEL_RESPONSE",
}
content
(str): Model response.
This endpoint allows the client to update the current model by specifying
the model ID. It accepts a JSON payload with a hf_model_id
key and responds with the operation status.
Endpoint: /change_model
Method: POST
Request body:
{
"hf_model_id": "MODEL_ID"
}
Response body:
{
"completed": boolean,
"model_name": "MODEL_ID"
}
completed
(bool): True if the token change was successful, False otherwise.model_name
(str): The name / id of the model currently in use (for confirmation).
This endpoint allows the client to update the current huggingface token by specifying a token. It accepts a JSON payload with a token
key and responds with the operation status.
Endpoint: /change_token
Method: POST
Request body:
{
"token": "HUGGINGFACE_TOKEN",
}
Response body:
{
"completed": boolean,
}
completed
(bool): True if the token change was successful, False otherwise.
This endpoint allows the client to get basic information about the active model in the application.
Endpoint: /model_info
Method: GET
Response body:
{
"model_name": "MODEL_ID",
"device": "DEVICE"
}
model_name
(str): The name of the model currently in use.device
(str): Information about the device executing the model (cpu) or (cuda)
curl --request POST \
--url http://localhost:8888/predict \
--header "Content-Type: application/json" \
--data '{"prompt": "Building a website can be done in 10 simple steps:","max_new_tokens": 128}'