This repository provides demonstration programs that apply the Λ-Split to LLMs, including Llama 2, and diffusion models, including Stable Diffusion XL (SDXL).
text_generation_demo.mp4
text_generation_demo_with_HTTP_720p.mp4
image_generation_demo_720p.mp4
Python version : 3.8 or later
python3 -m pip install -r requirements.txt
-
You must agree to Meta's license as stated on the Huggingface page.
-
Execute the following command
cd text_generation
python3 main.py
-
You must agree to Meta's license as stated on the Huggingface page.
-
Prepare 2 computers for cloud server and local device.
-
Execute the following command on each computer
Cloud
cd text_generation
python3 cloud_main.py
Local
cd text_generation
python3 edge_main.py
cd image_generation
python3 main.py
lambda_split/
│
├─ text_generation/
│ ├─ main.py
│ ├─ cloud_main.py : For HTTP communication
│ ├─ edge_main.py : For HTTP communication
│ └─ src/
│ ├─ base.py
│ ├─ cloud.py
│ ├─ edge.py
│ ├─ split_models.py : Definition of split sub-models.
│ └─ utils.py
│
├─ image_generation/
│ ├─ main.py
│ ├─ evaluation.py
│ └─ src/
│ ├─ quantizers.py : For quantization
│ ├─ split_pipelines.py : Definition of split sub-models.
│ └─ utils.py
│
└─ requirements.txt
- override forward method of models to correctly split inference layers at inference time (implemented by commenting out in
forward
method ofFirstLlamaModel
etc. insrc/models.py
) - replace unused layers with identity layers to reduce memory usage (implemented by
replace_unused_layers_with_identity
method insrc/models.py
FirstLlamaModel
etc.)