doc: add example of deploying api server to Kubernetes (#1488)

* doc: add example of deploying api server to Kubernetes * doc: fix lint error in yaml files * doc: simplify file hierarchy
InternLM · May 6, 2024 · 499b75b · 499b75b
1 parent 79e5fff
commit 499b75b
Show file tree

Hide file tree

Showing 4 changed files with 109 additions and 0 deletions.
diff --git a/docs/en/serving/api_server.md b/docs/en/serving/api_server.md
@@ -37,6 +37,17 @@ docker run --runtime nvidia --gpus all \
 
 The parameters of `api_server` are the same with that mentioned in "[option 1](#option-1-launching-with-lmdeploy-cli)" section
 
+### Option 3: Deploying to Kubernetes cluster
+
+Connect to a running Kubernetes cluster and deploy the internlm2-chat-7b model service with [kubectl](https://kubernetes.io/docs/reference/kubectl/) command-line tool (replace `<your token>` with your huggingface hub token):
+
+```shell
+sed 's/{{HUGGING_FACE_HUB_TOKEN}}/<your token>/' k8s/deployment.yaml | kubectl create -f - \
+    && kubectl create -f k8s/service.yaml
+```
+
+In the example above the model data is placed on the local disk of the node (hostPath). Consider replacing it with high-availability shared storage if multiple replicas are desired, and the storage can be mounted into container using [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/).
+
 ## RESTful API
 
 LMDeploy's RESTful API is compatible with the following three OpenAI interfaces:

diff --git a/docs/zh_cn/serving/api_server.md b/docs/zh_cn/serving/api_server.md
@@ -57,6 +57,17 @@ COPY . .
 CMD ["lmdeploy", "serve", "api_server", "liuhaotian/llava-v1.6-34b"]
 ```
 
+### 方式三：部署到Kubernetes集群
+
+使用[kubectl](https://kubernetes.io/docs/reference/kubectl/)命令行工具，连接到一个运行中Kubernetes集群并部署internlm2-chat-7b模型服务。下面是使用示例（需要替换`<your token>`为你的huggingface hub token）：
+
+```shell
+sed 's/{{HUGGING_FACE_HUB_TOKEN}}/<your token>/' k8s/deployment.yaml | kubectl create -f - \
+    && kubectl create -f k8s/service.yaml
+```
+
+示例中模型数据来源于node上的本地磁盘（hostPath），多副本部署时考虑替换为高可用共享存储，通过[PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)方式挂载到容器中。
+
 ## RESTful API
 
 LMDeploy 的 RESTful API 兼容了 OpenAI 以下 3 个接口：

diff --git a/k8s/deployment.yaml b/k8s/deployment.yaml
@@ -0,0 +1,72 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  labels:
+    app: internlm2-chat-7b
+  name: internlm2-chat-7b
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: internlm2-chat-7b
+  strategy: {}
+  template:
+    metadata:
+      labels:
+        app: internlm2-chat-7b
+    spec:
+      containers:
+      - name: internlm2-chat-7b
+        image: openmmlab/lmdeploy:latest
+        command:
+        - /bin/sh
+        - -c
+        args:
+        - "lmdeploy serve api_server internlm/internlm2-chat-7b --server-port 23333"
+        env:
+        - name: NCCL_LAUNCH_MODE
+          value: GROUP
+        - name: HUGGING_FACE_HUB_TOKEN
+          value: "{{HUGGING_FACE_HUB_TOKEN}}"
+        ports:
+        - containerPort: 23333
+          protocol: TCP
+          name: main
+        resources:
+          limits:
+            cpu: "16"
+            memory: 64Gi
+            nvidia.com/gpu: "1"
+          requests:
+            cpu: "16"
+            memory: 64Gi
+            nvidia.com/gpu: "1"
+        readinessProbe:
+          failureThreshold: 3
+          initialDelaySeconds: 400
+          periodSeconds: 10
+          successThreshold: 1
+          tcpSocket:
+            port: main
+          timeoutSeconds: 1
+        livenessProbe:
+          failureThreshold: 3
+          initialDelaySeconds: 900
+          periodSeconds: 20
+          successThreshold: 1
+          tcpSocket:
+            port: main
+          timeoutSeconds: 1
+        volumeMounts:
+        - mountPath: /root/.cache/huggingface
+          name: model-data
+        - mountPath: /dev/shm
+          name: dshm
+      volumes:
+      - name: model-data
+        hostPath:
+          path: /root/.cache/huggingface
+          type: DirectoryOrCreate
+      - emptyDir:
+          medium: Memory
+        name: dshm
diff --git a/k8s/service.yaml b/k8s/service.yaml
@@ -0,0 +1,15 @@
+apiVersion: v1
+kind: Service
+metadata:
+  labels:
+    app: internlm2-chat-7b
+  name: internlm2-chat-7b-svc
+spec:
+  ports:
+  - name: main
+    port: 23333
+    protocol: TCP
+    targetPort: main
+  selector:
+    app: internlm2-chat-7b
+  type: ClusterIP