Skip to content

Commit

Permalink
Reorganize the user guide and update the get_started section (#2038)
Browse files Browse the repository at this point in the history
* update

* adjust directory structure

* set depth 2

* check in installation.md

* check in installation.md

* update quick start

* update supported platforms

* update supported GPUs

* typo

* update

* update api_server

* update

* format the doc

* fix lint

* update generate.sh

* rollback pipeline.md

* update

* update zh_cn

* update

* fix lint

* fix lint

* fix

* remove build.md

* debug

---------

Co-authored-by: RunningLeon <[email protected]>
  • Loading branch information
lvhan028 and RunningLeon authored Aug 7, 2024
1 parent a129a14 commit 08cda6d
Show file tree
Hide file tree
Showing 49 changed files with 663 additions and 416 deletions.
31 changes: 14 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ ______________________________________________________________________

- \[2024/08\] 🔥🔥 LMDeploy is integrated into [modelscope/swift](https://github.com/modelscope/swift) as the default accelerator for VLMs inference
- \[2024/07\] 🎉🎉 Support Llama3.1 8B, 70B and its TOOLS CALLING
- \[2024/07\] Support [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) full-series models, [InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md) and [function call](docs/en/serving/api_server_tools.md) of InternLM2.5
- \[2024/07\] Support [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) full-series models, [InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md) and [function call](docs/en/llm/api_server_tools.md) of InternLM2.5
- \[2024/06\] PyTorch engine support DeepSeek-V2 and several VLMs, such as CogVLM2, Mini-InternVL, LlaVA-Next
- \[2024/05\] Balance vision model when deploying VLMs with multiple GPUs
- \[2024/05\] Support 4-bits weight-only quantization and inference on VLMs, such as InternVL v1.5, LLaVa, InternLMXComposer2
Expand All @@ -39,8 +39,8 @@ ______________________________________________________________________
- \[2024/03\] Support DeepSeek-VL offline inference pipeline and serving.
- \[2024/03\] Support VLM offline inference pipeline and serving.
- \[2024/02\] Support Qwen 1.5, Gemma, Mistral, Mixtral, Deepseek-MOE and so on.
- \[2024/01\] [OpenAOE](https://github.com/InternLM/OpenAOE) seamless integration with [LMDeploy Serving Service](./docs/en/serving/api_server.md).
- \[2024/01\] Support for multi-model, multi-machine, multi-card inference services. For usage instructions, please refer to [here](./docs/en/serving/proxy_server.md)
- \[2024/01\] [OpenAOE](https://github.com/InternLM/OpenAOE) seamless integration with [LMDeploy Serving Service](docs/en/llm/api_server.md).
- \[2024/01\] Support for multi-model, multi-machine, multi-card inference services. For usage instructions, please refer to [here](docs/en/llm/proxy_server.md)
- \[2024/01\] Support [PyTorch inference engine](./docs/en/inference/pytorch.md), developed entirely in Python, helping to lower the barriers for developers and enable rapid experimentation with new features and technologies.

</details>
Expand Down Expand Up @@ -167,19 +167,16 @@ They differ in the types of supported models and the inference data type. Please

## Installation

Install lmdeploy with pip ( python 3.8+) or [from source](./docs/en/build.md)
It is recommended installing lmdeploy using pip in a conda environment (python 3.8 - 3.12):

```shell
conda create -n lmdeploy python=3.8 -y
conda activate lmdeploy
pip install lmdeploy
```

Since v0.3.0, The default prebuilt package is compiled on **CUDA 12**. However, if CUDA 11+ is required, you can install lmdeploy by:

```shell
export LMDEPLOY_VERSION=0.5.3
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
The default prebuilt package is compiled on **CUDA 12** since v0.3.0.
For more information on installing on CUDA 11+ platform, or for instructions on building from source, please refer to the [installation guide](./docs/en/installation.md).

## Offline Batch Inference

Expand All @@ -195,7 +192,7 @@ print(response)
>
> `export LMDEPLOY_USE_MODELSCOPE=True`
For more information about inference pipeline, please refer to [here](./docs/en/inference/pipeline.md).
For more information about inference pipeline, please refer to [here](docs/en/llm/pipeline.md).

# Tutorials

Expand All @@ -204,10 +201,10 @@ Please review [getting_started](./docs/en/get_started.md) section for the basic
For detailed user guides and advanced guides, please refer to our [tutorials](https://lmdeploy.readthedocs.io/en/latest/):

- User Guide
- [LLM Inference pipeline](./docs/en/inference/pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Dh-YlSwg78ZO3AlleO441NF_QP2shs95#scrollTo=YALmXnwCG1pQ)
- [VLM Inference pipeline](./docs/en/inference/vl_pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nKLfnPeDA3p-FMNw2NhI-KOpk7-nlNjF?usp=sharing)
- [LLM Serving](docs/en/serving/api_server.md)
- [VLM Serving](docs/en/serving/api_server_vl.md)
- [LLM Inference pipeline](docs/en/llm/pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Dh-YlSwg78ZO3AlleO441NF_QP2shs95#scrollTo=YALmXnwCG1pQ)
- [VLM Inference pipeline](docs/en/multi_modal/vl_pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nKLfnPeDA3p-FMNw2NhI-KOpk7-nlNjF?usp=sharing)
- [LLM Serving](docs/en/llm/api_server.md)
- [VLM Serving](docs/en/multi_modal/api_server_vl.md)
- [Quantization](docs/en/quantization)
- Advance Guide
- [Inference Engine - TurboMind](docs/en/inference/turbomind.md)
Expand All @@ -216,7 +213,7 @@ For detailed user guides and advanced guides, please refer to our [tutorials](ht
- [Add a new model](docs/en/advance/pytorch_new_model.md)
- gemm tuning
- [Long context inference](docs/en/advance/long_context.md)
- [Multi-model inference service](docs/en/serving/proxy_server.md)
- [Multi-model inference service](docs/en/llm/proxy_server.md)

# Third-party projects

Expand Down
31 changes: 14 additions & 17 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ ______________________________________________________________________

- \[2024/08\] 🔥🔥 LMDeployは[modelscope/swift](https://github.com/modelscope/swift)に統合され、VLMs推論のデフォルトアクセラレータとなりました
- \[2024/07\] 🎉🎉 Llama3.1 8B、70Bおよびそのツールコールをサポート
- \[2024/07\] [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e)全シリーズモデル、[InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md)およびInternLM2.5の[ファンクションコール](docs/en/serving/api_server_tools.md)をサポート
- \[2024/07\] [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e)全シリーズモデル、[InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md)およびInternLM2.5の[ファンクションコール](docs/en/llm/api_server_tools.md)をサポート
- \[2024/06\] PyTorchエンジンはDeepSeek-V2およびいくつかのVLMs、例えばCogVLM2、Mini-InternVL、LlaVA-Nextをサポート
- \[2024/05\] 複数のGPUでVLMsをデプロイする際にビジョンモデルをバランスさせる
- \[2024/05\] InternVL v1.5、LLaVa、InternLMXComposer2などのVLMsで4ビットの重みのみの量子化と推論をサポート
Expand All @@ -39,8 +39,8 @@ ______________________________________________________________________
- \[2024/03\] DeepSeek-VLのオフライン推論パイプラインとサービングをサポート
- \[2024/03\] VLMのオフライン推論パイプラインとサービングをサポート
- \[2024/02\] Qwen 1.5、Gemma、Mistral、Mixtral、Deepseek-MOEなどをサポート
- \[2024/01\] [OpenAOE](https://github.com/InternLM/OpenAOE)[LMDeployサービングサービス](./docs/en/serving/api_server.md)とシームレスに統合されました
- \[2024/01\] 複数モデル、複数マシン、複数カードの推論サービスをサポート。使用方法は[こちら](./docs/en/serving/proxy_server.md)を参照してください
- \[2024/01\] [OpenAOE](https://github.com/InternLM/OpenAOE)[LMDeployサービングサービス](./docs/en/llm/api_server.md)とシームレスに統合されました
- \[2024/01\] 複数モデル、複数マシン、複数カードの推論サービスをサポート。使用方法は[こちら](./docs/en/llm/proxy_server.md)を参照してください
- \[2024/01\] [PyTorch推論エンジン](./docs/en/inference/pytorch.md)をサポートし、完全にPythonで開発されており、開発者の障壁を下げ、新機能や技術の迅速な実験を可能にします

</details>
Expand Down Expand Up @@ -168,19 +168,16 @@ LMDeployは、[TurboMind](./docs/en/inference/turbomind.md)および[PyTorch](./

## インストール

pip(python 3.8+)を使用してlmdeployをインストールするか、[ソースからインストール](./docs/en/build.md)します
クリーンなconda環境(Python 3.8 - 3.12)でlmdeployをインストールすることをお勧めします。

```shell
conda create -n lmdeploy python=3.8 -y
conda activate lmdeploy
pip install lmdeploy
```

v0.3.0以降、デフォルトのプリビルドパッケージは**CUDA 12**でコンパイルされています。ただし、CUDA 11+が必要な場合は、次のコマンドでlmdeployをインストールできます:

```shell
export LMDEPLOY_VERSION=0.5.3
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
v0.3.0から、デフォルトの事前構築済みパッケージはCUDA 12でコンパイルされています。
CUDA 11+プラットフォームでのインストールに関する情報、またはソースからのビルド手順については、[インストールガイドを](docs/en/installation.md)参照してください。

## オフラインバッチ推論

Expand All @@ -196,7 +193,7 @@ print(response)
>
> `export LMDEPLOY_USE_MODELSCOPE=True`
推論パイプラインに関する詳細情報は[こちら](./docs/en/inference/pipeline.md)を参照してください。
推論パイプラインに関する詳細情報は[こちら](./docs/en/llm/pipeline.md)を参照してください。

# チュートリアル

Expand All @@ -205,10 +202,10 @@ LMDeployの基本的な使用方法については、[getting_started](./docs/en
詳細なユーザーガイドと高度なガイドについては、[チュートリアル](https://lmdeploy.readthedocs.io/en/latest/)を参照してください:

- ユーザーガイド
- [LLM推論パイプライン](./docs/en/inference/pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Dh-YlSwg78ZO3AlleO441NF_QP2shs95#scrollTo=YALmXnwCG1pQ)
- [VLM推論パイプライン](./docs/en/inference/vl_pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nKLfnPeDA3p-FMNw2NhI-KOpk7-nlNjF?usp=sharing)
- [LLMサービング](docs/en/serving/api_server.md)
- [VLMサービング](docs/en/serving/api_server_vl.md)
- [LLM推論パイプライン](./docs/en/llm/pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Dh-YlSwg78ZO3AlleO441NF_QP2shs95#scrollTo=YALmXnwCG1pQ)
- [VLM推論パイプライン](./docs/en/multi_modal/vl_pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nKLfnPeDA3p-FMNw2NhI-KOpk7-nlNjF?usp=sharing)
- [LLMサービング](docs/en/llm/api_server.md)
- [VLMサービング](docs/en/multi_modal/api_server_vl.md)
- [量子化](docs/en/quantization)
- 高度なガイド
- [推論エンジン - TurboMind](docs/en/inference/turbomind.md)
Expand All @@ -217,7 +214,7 @@ LMDeployの基本的な使用方法については、[getting_started](./docs/en
- [新しいモデルの追加](docs/en/advance/pytorch_new_model.md)
- gemmチューニング
- [長文推論](docs/en/advance/long_context.md)
- [マルチモデル推論サービス](docs/en/serving/proxy_server.md)
- [マルチモデル推論サービス](docs/en/llm/proxy_server.md)

# サードパーティプロジェクト

Expand Down
30 changes: 13 additions & 17 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ ______________________________________________________________________

- \[2024/08\] 🔥🔥 LMDeploy现已集成至 [modelscope/swift](https://github.com/modelscope/swift),成为 VLMs 推理的默认加速引擎
- \[2024/07\] 🎉🎉 支持 Llama3.1 8B 和 70B 模型,以及工具调用功能
- \[2024/07\] 支持 [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) 全系列模型,[InternLM-XComposer2.5](docs/zh_cn/multi_modal/xcomposer2d5.md) 模型和 InternLM2.5 的 [function call 功能](docs/zh_cn/serving/api_server_tools.md)
- \[2024/07\] 支持 [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) 全系列模型,[InternLM-XComposer2.5](docs/zh_cn/multi_modal/xcomposer2d5.md) 模型和 InternLM2.5 的 [function call 功能](docs/zh_cn/llm/api_server_tools.md)
- \[2024/06\] PyTorch engine 支持了 DeepSeek-V2 和若干 VLM 模型推理, 比如 CogVLM2,Mini-InternVL,LlaVA-Next
- \[2024/05\] 在多 GPU 上部署 VLM 模型时,支持把视觉部分的模型均分到多卡上
- \[2024/05\] 支持InternVL v1.5, LLaVa, InternLMXComposer2 等 VLMs 模型的 4bit 权重量化和推理
Expand All @@ -39,8 +39,8 @@ ______________________________________________________________________
- \[2024/03\] 支持 DeepSeek-VL 的离线推理 pipeline 和推理服务
- \[2024/03\] 支持视觉-语言模型(VLM)的离线推理 pipeline 和推理服务
- \[2024/02\] 支持 Qwen 1.5、Gemma、Mistral、Mixtral、Deepseek-MOE 等模型
- \[2024/01\] [OpenAOE](https://github.com/InternLM/OpenAOE) 发布,支持无缝接入[LMDeploy Serving Service](./docs/zh_cn/serving/api_server.md)
- \[2024/01\] 支持多模型、多机、多卡推理服务。使用方法请参考[此处](./docs/zh_cn/serving/proxy_server.md)
- \[2024/01\] [OpenAOE](https://github.com/InternLM/OpenAOE) 发布,支持无缝接入[LMDeploy Serving Service](docs/zh_cn/llm/api_server.md)
- \[2024/01\] 支持多模型、多机、多卡推理服务。使用方法请参考[此处](docs/zh_cn/llm/proxy_server.md)
- \[2024/01\] 增加 [PyTorch 推理引擎](./docs/zh_cn/inference/pytorch.md),作为 TurboMind 引擎的补充。帮助降低开发门槛,和快速实验新特性、新技术

</details>
Expand Down Expand Up @@ -168,19 +168,15 @@ LMDeploy 支持 2 种推理引擎: [TurboMind](./docs/zh_cn/inference/turbomin

## 安装

使用 pip ( python 3.8+) 安装 LMDeploy,或者[源码安装](./docs/zh_cn/build.md)
我们推荐在一个干净的conda环境下(python3.8 - 3.12),安装 lmdeploy:

```shell
conda create -n lmdeploy python=3.8 -y
conda activate lmdeploy
pip install lmdeploy
```

自 v0.3.0 起,LMDeploy 预编译包默认基于 CUDA 12 编译。如果需要在 CUDA 11+ 下安装 LMDeploy,请执行以下命令:

```shell
export LMDEPLOY_VERSION=0.5.3
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
自 v0.3.0 起,LMDeploy 预编译包默认基于 CUDA 12 编译。如果需要在 CUDA 11+ 下安装 LMDeploy,或者源码安装 LMDeploy,请参考[安装文档](./docs/zh_cn/installation.md)

## 离线批处理

Expand All @@ -196,7 +192,7 @@ print(response)
>
> `export LMDEPLOY_USE_MODELSCOPE=True`
关于 pipeline 的更多推理参数说明,请参考[这里](./docs/zh_cn/inference/pipeline.md)
关于 pipeline 的更多推理参数说明,请参考[这里](docs/zh_cn/llm/pipeline.md)

# 用户教程

Expand All @@ -205,10 +201,10 @@ print(response)
为了帮助用户更进一步了解 LMDeploy,我们准备了用户指南和进阶指南,请阅读我们的[文档](https://lmdeploy.readthedocs.io/zh-cn/latest/)

- 用户指南
- [LLM 推理 pipeline](./docs/zh_cn/inference/pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Dh-YlSwg78ZO3AlleO441NF_QP2shs95#scrollTo=YALmXnwCG1pQ)
- [VLM 推理 pipeline](./docs/zh_cn/inference/vl_pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nKLfnPeDA3p-FMNw2NhI-KOpk7-nlNjF?usp=sharing)
- [LLM 推理服务](./docs/zh_cn/serving/api_server.md)
- [VLM 推理服务](./docs/zh_cn/serving/api_server_vl.md)
- [LLM 推理 pipeline](docs/zh_cn/llm/pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Dh-YlSwg78ZO3AlleO441NF_QP2shs95#scrollTo=YALmXnwCG1pQ)
- [VLM 推理 pipeline](docs/zh_cn/multi_modal/vl_pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nKLfnPeDA3p-FMNw2NhI-KOpk7-nlNjF?usp=sharing)
- [LLM 推理服务](docs/zh_cn/llm/api_server.md)
- [VLM 推理服务](docs/zh_cn/multi_modal/api_server_vl.md)
- [模型量化](./docs/zh_cn/quantization)
- 进阶指南
- [推理引擎 - TurboMind](./docs/zh_cn/inference/turbomind.md)
Expand All @@ -217,7 +213,7 @@ print(response)
- [支持新模型](./docs/zh_cn/advance/pytorch_new_model.md)
- gemm tuning
- [长文本推理](./docs/zh_cn/advance/long_context.md)
- [多模型推理服务](./docs/zh_cn/serving/proxy_server.md)
- [多模型推理服务](docs/zh_cn/llm/proxy_server.md)

# 社区项目

Expand Down
2 changes: 1 addition & 1 deletion docs/en/advance/debug_turbomind.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Turbomind is implemented in C++, which is not as easy to debug as Python. This d

## Prerequisite

First, complete the local compilation according to the commands in [Build in localhost](../build.md).
First, complete the local compilation according to the commands in [Install from source](../installation.md).

## Configure Python debug environment

Expand Down
6 changes: 1 addition & 5 deletions docs/en/benchmark/evaluate_with_opencompass.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,7 @@ In this part, we are going to setup the environment for evaluation.

### Install lmdeploy

Install lmdeploy through pip (python 3.8+). If you want to install from source, you can refer to [build.md](../build.md).

```shell
pip install lmdeploy
```
Please follow the [installation guide](../installation.md) to install lmdeploy.

### Install OpenCompass

Expand Down
2 changes: 1 addition & 1 deletion docs/en/benchmark/profile_api_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ In this section, we take [internlm/internlm-7b](https://huggingface.co/internlm/
lmdeploy serve api_server internlm/internlm-7b
```

If you would like to change the server's port or other parameters, such as inference engine, max batch size and etc., please run `lmdeploy serve api_server -h` or read [this](../serving/api_server.md) guide to get the detailed explanation.
If you would like to change the server's port or other parameters, such as inference engine, max batch size and etc., please run `lmdeploy serve api_server -h` or read [this](../llm/api_server.md) guide to get the detailed explanation.

### Profile

Expand Down
Loading

0 comments on commit 08cda6d

Please sign in to comment.