Make a web service using Llama3 -8B for Japanese conversation. Model supported by Llama-3-ELYZA-JP-8B-GGUF
This repo just a sample for using docker to create the composed continers.
The flow like as follows:
- create a llama api
- test your api (can skip)
- create a requirements list
- make dockerfile for llama continer.
- make docker-compose.yaml
- open the NoteRED webpage
- edit your service
run this for Linux:
sudo wget https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF/resolve/main/Llama-3-ELYZA-JP-8B-q4_k_m.gguf
or download by yourself:
https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF/blob/main/Llama-3-ELYZA-JP-8B-q4_k_m.gguf
using:
git clone https://github.com/WUBIN10086/Llama3-JP-using-NoteRED.git
and add your downloaded model file (in step 2) to the repo floder, the structure should like:
web_llama
├── API-python
│ ├── llama_api.py
│ └── test_llama_fastapi.py
├── Llama-3-ELYZA-JP-8B-q4_k_m.gguf
├── requirements.txt
└── docker-compose.yml
!!! Dont forget add the model (Llama-3-ELYZA-JP-8B-q4_k_m.gguf).
run this command:
docker-compose up --build
127.0.0.1:1880
Import the content in the NoteRED.txt