DOC: update readme & add tips for large image models (#2056)

xorbitsai · Aug 10, 2024 · c4cbd38 · c4cbd38
1 parent 3e7ed86
commit c4cbd38
Show file tree

Hide file tree

Showing 6 changed files with 160 additions and 70 deletions.
diff --git a/README.md b/README.md
@@ -34,14 +34,14 @@ potential of cutting-edge AI models.
 - Support speech recognition model: [#929](https://github.com/xorbitsai/inference/pull/929)
 - Metrics support: [#906](https://github.com/xorbitsai/inference/pull/906)
 ### New Models
+- Built-in support for [CogVideoX](https://github.com/THUDM/CogVideo): [#2049](https://github.com/xorbitsai/inference/pull/2049)
+- Built-in support for [flux.1-schnell & flux.1-dev](https://www.basedlabs.ai/tools/flux1): [#2007](https://github.com/xorbitsai/inference/pull/2007)
+- Built-in support for [MiniCPM-V 2.6](https://github.com/OpenBMB/MiniCPM-V): [#2031](https://github.com/xorbitsai/inference/pull/2031)
+- Built-in support for [Kolors](https://huggingface.co/Kwai-Kolors/Kolors): [#2028](https://github.com/xorbitsai/inference/pull/2028)
+- Built-in support for [SenseVoice](https://github.com/FunAudioLLM/SenseVoice): [#2008](https://github.com/xorbitsai/inference/pull/2008)
 - Built-in support for [Mistral Large 2](https://mistral.ai/news/mistral-large-2407/): [#1944](https://github.com/xorbitsai/inference/pull/1944)
 - Built-in support for [llama3.1](https://ai.meta.com/blog/meta-llama-3-1/): [#1932](https://github.com/xorbitsai/inference/pull/1932)
 - Built-in support for [Mistral Nemo](https://mistral.ai/news/mistral-nemo/): [#1936](https://github.com/xorbitsai/inference/pull/1936)
-- Built-in support for [CosyVoice](https://github.com/FunAudioLLM/CosyVoice): [#1881](https://github.com/xorbitsai/inference/pull/1881)
-- Built-in support for [codegeex4](https://github.com/THUDM/CodeGeeX4): [#1888](https://github.com/xorbitsai/inference/pull/1888)
-- Built-in support for [Gemma-2-it](https://huggingface.co/blog/gemma2): [#1774](https://github.com/xorbitsai/inference/pull/1774)
-- Built-in support for [jina-reranker-v2](https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual): [#1733](https://github.com/xorbitsai/inference/pull/1733)
-- Built-in support for [Qwen2](https://github.com/QwenLM/Qwen2): [#1509](https://github.com/xorbitsai/inference/pull/1597)
 ### Integrations
 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
 - [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.

diff --git a/README_zh_CN.md b/README_zh_CN.md
@@ -31,14 +31,14 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 - 支持语音识别模型: [#929](https://github.com/xorbitsai/inference/pull/929)
 - 增加 Metrics 统计信息: [#906](https://github.com/xorbitsai/inference/pull/906)
 ### 新模型
+- 内置 [CogVideoX](https://github.com/THUDM/CogVideo): [#2049](https://github.com/xorbitsai/inference/pull/2049)
+- 内置 [flux.1-schnell & flux.1-dev](https://www.basedlabs.ai/tools/flux1): [#2007](https://github.com/xorbitsai/inference/pull/2007)
+- 内置 [MiniCPM-V 2.6](https://github.com/OpenBMB/MiniCPM-V): [#2031](https://github.com/xorbitsai/inference/pull/2031)
+- 内置 [Kolors](https://huggingface.co/Kwai-Kolors/Kolors): [#2028](https://github.com/xorbitsai/inference/pull/2028)
+- 内置 [SenseVoice](https://github.com/FunAudioLLM/SenseVoice): [#2008](https://github.com/xorbitsai/inference/pull/2008)
 - 内置 [Mistral Large 2](https://mistral.ai/news/mistral-large-2407/): [#1944](https://github.com/xorbitsai/inference/pull/1944)
 - 内置 [llama3.1](https://ai.meta.com/blog/meta-llama-3-1/): [#1932](https://github.com/xorbitsai/inference/pull/1932)
 - 内置 [Mistral Nemo](https://mistral.ai/news/mistral-nemo/): [#1936](https://github.com/xorbitsai/inference/pull/1936)
-- 内置 [CosyVoice](https://github.com/FunAudioLLM/CosyVoice): [#1881](https://github.com/xorbitsai/inference/pull/1881)
-- 内置 [codegeex4](https://github.com/THUDM/CodeGeeX4): [#1888](https://github.com/xorbitsai/inference/pull/1888)
-- 内置 [Gemma-2-it](https://huggingface.co/blog/gemma2): [#1774](https://github.com/xorbitsai/inference/pull/1774)
-- 内置 [jina-reranker-v2](https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual): [#1733](https://github.com/xorbitsai/inference/pull/1733)
-- 内置 [Qwen2](https://github.com/QwenLM/Qwen2): [#1509](https://github.com/xorbitsai/inference/pull/1597)
 ### 集成
 - [FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/)：一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力，帮助您轻松实现复杂的问答场景。
 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。

diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/audio.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/audio.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2024-07-30 21:20+0800\n"
+"POT-Creation-Date: 2024-08-09 19:13+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -131,27 +131,31 @@ msgstr ""
 msgid "Belle-whisper-large-v3-zh"
 msgstr ""
 
-#: ../../source/models/model_abilities/audio.rst:60
+#: ../../source/models/model_abilities/audio.rst:57
+msgid "SenseVoiceSmall"
+msgstr ""
+
+#: ../../source/models/model_abilities/audio.rst:61
 msgid "Text to audio"
 msgstr "文本转语音"
 
-#: ../../source/models/model_abilities/audio.rst:62
+#: ../../source/models/model_abilities/audio.rst:63
 msgid "ChatTTS"
 msgstr ""
 
-#: ../../source/models/model_abilities/audio.rst:63
+#: ../../source/models/model_abilities/audio.rst:64
 msgid "CosyVoice"
 msgstr ""
 
-#: ../../source/models/model_abilities/audio.rst:66
+#: ../../source/models/model_abilities/audio.rst:67
 msgid "Quickstart"
 msgstr "快速入门"
 
-#: ../../source/models/model_abilities/audio.rst:69
+#: ../../source/models/model_abilities/audio.rst:70
 msgid "Transcription"
 msgstr "转录"
 
-#: ../../source/models/model_abilities/audio.rst:71
+#: ../../source/models/model_abilities/audio.rst:72
 msgid ""
 "The Transcription API mimics OpenAI's `create transcriptions API "
 "<https://platform.openai.com/docs/api-"
@@ -163,11 +167,11 @@ msgstr ""
 "可以通过 cURL、OpenAI Client 或者 Xinference 的 Python 客户端来尝试 "
 "Transcription API："
 
-#: ../../source/models/model_abilities/audio.rst:122
+#: ../../source/models/model_abilities/audio.rst:123
 msgid "Translation"
 msgstr "翻译"
 
-#: ../../source/models/model_abilities/audio.rst:124
+#: ../../source/models/model_abilities/audio.rst:125
 msgid ""
 "The Translation API mimics OpenAI's `create translations API "
 "<https://platform.openai.com/docs/api-"
@@ -179,11 +183,11 @@ msgstr ""
 "通过 cURL、OpenAI Client 或 Xinference 的 Python 客户端来尝试使用 "
 "Translation API："
 
-#: ../../source/models/model_abilities/audio.rst:174
+#: ../../source/models/model_abilities/audio.rst:175
 msgid "Speech"
 msgstr "语音"
 
-#: ../../source/models/model_abilities/audio.rst:176
+#: ../../source/models/model_abilities/audio.rst:177
 msgid ""
 "The Speech API mimics OpenAI's `create speech API "
 "<https://platform.openai.com/docs/api-reference/audio/createSpeech>`_. We"
@@ -194,44 +198,47 @@ msgstr ""
 "openai.com/docs/api-reference/audio/createSpeech>`_。你可以通过 cURL、"
 "OpenAI Client 或者 Xinference 的 Python 客户端来尝试 Speech API："
 
-#: ../../source/models/model_abilities/audio.rst:179
+#: ../../source/models/model_abilities/audio.rst:180
 msgid "Speech API use non-stream by default as"
 msgstr "Speech API 默认使用非流式"
 
-#: ../../source/models/model_abilities/audio.rst:181
+#: ../../source/models/model_abilities/audio.rst:182
 msgid ""
 "The stream output of ChatTTS is not as good as the non-stream output, "
 "please refer to: https://github.com/2noise/ChatTTS/pull/564"
 msgstr ""
-"ChatTTS 的流式输出不如非流式的效果好，参考：https://github.com/2noise/ChatTTS/pull/564"
+"ChatTTS 的流式输出不如非流式的效果好，参考：https://github.com/2noise/"
+"ChatTTS/pull/564"
 
-#: ../../source/models/model_abilities/audio.rst:182
+#: ../../source/models/model_abilities/audio.rst:183
 msgid ""
 "The stream requires ffmpeg<7: "
 "https://pytorch.org/audio/stable/installation.html#optional-dependencies"
-msgstr "流式要求 ffmpeg<7：https://pytorch.org/audio/stable/installation.html#optional-dependencies"
+msgstr ""
+"流式要求 ffmpeg<7：https://pytorch.org/audio/stable/installation.html#"
+"optional-dependencies"
 
-#: ../../source/models/model_abilities/audio.rst:234
+#: ../../source/models/model_abilities/audio.rst:235
 msgid "CosyVoice Usage"
 msgstr "CosyVoice 模型使用"
 
-#: ../../source/models/model_abilities/audio.rst:236
+#: ../../source/models/model_abilities/audio.rst:237
 msgid "Basic usage, launch model ``CosyVoice-300M-SFT``."
 msgstr "基本使用，加载模型 ``CosyVoice-300M-SFT``。"
 
-#: ../../source/models/model_abilities/audio.rst:285
+#: ../../source/models/model_abilities/audio.rst:286
 msgid "Clone voice, launch model ``CosyVoice-300M``."
 msgstr "克隆声音，加载模型 ``CosyVoice-300M``。"
 
-#: ../../source/models/model_abilities/audio.rst:308
+#: ../../source/models/model_abilities/audio.rst:309
 msgid "Cross lingual usage, launch model ``CosyVoice-300M``."
 msgstr "跨语言使用，加载模型 ``CosyVoice-300M``。"
 
-#: ../../source/models/model_abilities/audio.rst:327
+#: ../../source/models/model_abilities/audio.rst:328
 msgid "Instruction based, launch model ``CosyVoice-300M-Instruct``."
 msgstr "基于指令的声音合成，加载模型 ``CosyVoice-300M-Instruct``。"
 
-#: ../../source/models/model_abilities/audio.rst:344
+#: ../../source/models/model_abilities/audio.rst:345
 msgid ""
 "More instructions and examples, could be found at https://fun-audio-"
 "llm.github.io/ ."

diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/image.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/image.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2024-06-26 12:25+0000\n"
+"POT-Creation-Date: 2024-08-09 19:13+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -20,8 +20,8 @@ msgstr ""
 "Generated-By: Babel 2.14.0\n"
 
 #: ../../source/models/model_abilities/image.rst:5
-msgid "Images (Experimental)"
-msgstr "图像（实验性质）"
+msgid "Images"
+msgstr "图像"
 
 #: ../../source/models/model_abilities/image.rst:7
 msgid "Learn how to generate images with Xinference."
@@ -101,15 +101,23 @@ msgstr ""
 msgid "sd3-medium"
 msgstr ""
 
-#: ../../source/models/model_abilities/image.rst:47
+#: ../../source/models/model_abilities/image.rst:44
+msgid "FLUX.1-schnell"
+msgstr ""
+
+#: ../../source/models/model_abilities/image.rst:45
+msgid "FLUX.1-dev"
+msgstr ""
+
+#: ../../source/models/model_abilities/image.rst:49
 msgid "Quickstart"
 msgstr "快速入门"
 
-#: ../../source/models/model_abilities/image.rst:50
+#: ../../source/models/model_abilities/image.rst:52
 msgid "Text-to-image"
 msgstr "文生图"
 
-#: ../../source/models/model_abilities/image.rst:52
+#: ../../source/models/model_abilities/image.rst:54
 msgid ""
 "The Text-to-image API mimics OpenAI's `create images API "
 "<https://platform.openai.com/docs/api-reference/images/create>`_. We can "
@@ -119,38 +127,77 @@ msgstr ""
 "可以通过 cURL、OpenAI Client 或 Xinference 的方式尝试使用 Text-to-image "
 "API。"
 
-#: ../../source/models/model_abilities/image.rst:108
+#: ../../source/models/model_abilities/image.rst:109
+msgid "Tips for Large Image models including sd3-medium, FLUX.1"
+msgstr "大型图像模型部署（sd3-medium、FLUX.1 系列）贴士"
+
+#: ../../source/models/model_abilities/image.rst:111
+msgid "Useful extra parameters can be passed to launch including:"
+msgstr "有用的传递给加载模型的额外参数包括："
+
+#: ../../source/models/model_abilities/image.rst:113
 msgid ""
-"If you are running ``sd3-medium`` on a GPU less than 24GB and "
-"encountering out of memory, consider to add an extra param for launching "
-"according to `this article "
-"<https://huggingface.co/docs/diffusers/v0.29.1/en/api/pipelines/stable_diffusion/stable_diffusion_3"
-"#dropping-the-t5-text-encoder-during-inference>`_."
+"``--cpu_offload True``: specifying ``True`` will offload the components "
+"of the model to CPU during inference in order to save memory, while "
+"seeing a slight increase in inference latency. Model offloading will only"
+" move a model component onto the GPU when it needs to be executed, while "
+"keeping the remaining components on the CPU."
 msgstr ""
-"如果你在小于 24GB 的显卡上运行 ``sd3-medium`` 碰到内存不足的问题时，根据 "
-"`这篇文章 <https://huggingface.co/docs/diffusers/v0.29.1/en/api/"
-"pipelines/stable_diffusion/stable_diffusion_3#dropping-the-t5-text-"
-"encoder-during-inference>`_ 考虑在加载模型时增加额外选项。"
+"``--cpu_offload True``：指定 ``True`` 会在推理过程中将模型的组件卸载到 CPU 上以节省内存，"
+"这会导致推理延迟略有增加。模型卸载仅会在需要执行时将模型组件移动到 GPU 上，同时保持其余组件在 CPU 上"
 
-#: ../../source/models/model_abilities/image.rst:111
+#: ../../source/models/model_abilities/image.rst:117
 msgid ""
-"xinference launch --model-name sd3-medium --model-type image "
-"--text_encoder_3 None"
+"``--quantize_text_encoder <text encoder layer>``: We leveraged the "
+"``bitsandbytes`` library to load and quantize the T5-XXL text encoder to "
+"8-bit precision. This allows you to keep using all text encoders "
+"while only slightly impacting performance."
+msgstr "``--quantize_text_encoder <text encoder layer>``：我们利用 ``bitsandbytes`` 库"
+"加载并量化 T5-XXL 文本编码器至8位精度。这使得你能够在仅轻微影响性能的情况下继续使用全部文本编码器。"
+
+#: ../../source/models/model_abilities/image.rst:120
+msgid ""
+"``--text_encoder_3 None``, for sd3-medium, removing the memory-intensive "
+"4.7B parameter T5-XXL text encoder during inference can significantly "
+"decrease the memory requirements with only a slight loss in performance."
 msgstr ""
+"``--text_encoder_3 None``，对于 sd3-medium，"
+"移除在推理过程中内存密集型的47亿参数T5-XXL文本编码器可以显著降低内存需求，而仅造成性能上的轻微损失。"
+
+#: ../../source/models/model_abilities/image.rst:124
+msgid ""
+"If you are trying to run large image models liek sd3-medium or FLUX.1 "
+"series on GPU card that has less memory than 24GB, you may encounter OOM "
+"when launching or inference. Try below solutions."
+msgstr "如果你试图在显存小于24GB的GPU上运行像sd3-medium或FLUX.1系列这样的大型图像模型，"
+"你在启动或推理过程中可能会遇到显存溢出（OOM）的问题。尝试以下解决方案。"
 
-#: ../../source/models/model_abilities/image.rst:114
+#: ../../source/models/model_abilities/image.rst:128
+msgid "For FLUX.1 series, try to apply quantization."
+msgstr "对于 FLUX.1 系列，尝试应用量化。"
+
+#: ../../source/models/model_abilities/image.rst:134
+msgid "For sd3-medium, apply quantization to ``text_encoder_3``."
+msgstr "对于 sd3-medium 模型，对 ``text_encoder_3`` 应用量化。"
+
+#: ../../source/models/model_abilities/image.rst:141
+msgid "Or removing memory-intensive T5-XXL text encoder for sd3-medium."
+msgstr "或者，移除 sd3-medium 模型中内存密集型的 T5-XXL 文本编码器。"
+
+#: ../../source/models/model_abilities/image.rst:148
 msgid "Image-to-image"
 msgstr "图生图"
 
-#: ../../source/models/model_abilities/image.rst:116
+#: ../../source/models/model_abilities/image.rst:150
 msgid "You can find more examples of Images API in the tutorial notebook:"
 msgstr "你可以在教程笔记本中找到更多 Images API 的示例。"
 
-#: ../../source/models/model_abilities/image.rst:120
+#: ../../source/models/model_abilities/image.rst:154
 msgid "Stable Diffusion ControlNet"
 msgstr ""
 
-#: ../../source/models/model_abilities/image.rst:123
+#: ../../source/models/model_abilities/image.rst:157
 msgid "Learn from a Stable Diffusion ControlNet example"
 msgstr "学习一个 Stable Diffusion 控制网络的示例"
 
+
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/vision.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/vision.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2024-06-05 12:48+0800\n"
+"POT-Creation-Date: 2024-07-28 22:01+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -79,11 +79,15 @@ msgstr ""
 msgid ":ref:`MiniCPM-Llama3-V 2.5 <models_llm_minicpm-llama3-v-2_5>`"
 msgstr ""
 
-#: ../../source/models/model_abilities/vision.rst:33
+#: ../../source/models/model_abilities/vision.rst:30
+msgid ":ref:`GLM-4V <models_llm_glm-4v>`"
+msgstr ""
+
+#: ../../source/models/model_abilities/vision.rst:34
 msgid "Quickstart"
 msgstr "快速入门"
 
-#: ../../source/models/model_abilities/vision.rst:35
+#: ../../source/models/model_abilities/vision.rst:36
 msgid ""
 "Images are made available to the model in two main ways: by passing a "
 "link to the image or by passing the base64 encoded image directly in the "
@@ -92,23 +96,23 @@ msgstr ""
 "模型可以通过两种主要方式获取图像：通过传递图像的链接或直接在请求中传递 "
 "base64 编码的图像。"
 
-#: ../../source/models/model_abilities/vision.rst:39
+#: ../../source/models/model_abilities/vision.rst:40
 msgid "Example using OpenAI Client"
 msgstr "使用 OpenAI 客户端的示例"
 
-#: ../../source/models/model_abilities/vision.rst:70
+#: ../../source/models/model_abilities/vision.rst:71
 msgid "Uploading base 64 encoded images"
 msgstr "上传 Base64 编码的图片"
 
-#: ../../source/models/model_abilities/vision.rst:112
+#: ../../source/models/model_abilities/vision.rst:113
 msgid "You can find more examples of ``vision`` ability in the tutorial notebook:"
 msgstr "你可以在教程笔记本中找到更多关于 ``vision`` 能力的示例。"
 
-#: ../../source/models/model_abilities/vision.rst:116
+#: ../../source/models/model_abilities/vision.rst:117
 msgid "Qwen VL Chat"
 msgstr ""
 
-#: ../../source/models/model_abilities/vision.rst:119
+#: ../../source/models/model_abilities/vision.rst:120
 msgid "Learn vision ability from a example using qwen-vl-chat"
 msgstr "通过使用 qwen-vl-chat 的示例来学习使用 LLM 的视觉能力"