Skip to content

Commit

Permalink
doc: add example of deploying api server to Kubernetes (#1488)
Browse files Browse the repository at this point in the history
* doc: add example of deploying api server to Kubernetes

* doc: fix lint error in yaml files

* doc: simplify file hierarchy
  • Loading branch information
uzuku authored May 6, 2024
1 parent 79e5fff commit 499b75b
Show file tree
Hide file tree
Showing 4 changed files with 109 additions and 0 deletions.
11 changes: 11 additions & 0 deletions docs/en/serving/api_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,17 @@ docker run --runtime nvidia --gpus all \

The parameters of `api_server` are the same with that mentioned in "[option 1](#option-1-launching-with-lmdeploy-cli)" section

### Option 3: Deploying to Kubernetes cluster

Connect to a running Kubernetes cluster and deploy the internlm2-chat-7b model service with [kubectl](https://kubernetes.io/docs/reference/kubectl/) command-line tool (replace `<your token>` with your huggingface hub token):

```shell
sed 's/{{HUGGING_FACE_HUB_TOKEN}}/<your token>/' k8s/deployment.yaml | kubectl create -f - \
&& kubectl create -f k8s/service.yaml
```

In the example above the model data is placed on the local disk of the node (hostPath). Consider replacing it with high-availability shared storage if multiple replicas are desired, and the storage can be mounted into container using [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/).

## RESTful API

LMDeploy's RESTful API is compatible with the following three OpenAI interfaces:
Expand Down
11 changes: 11 additions & 0 deletions docs/zh_cn/serving/api_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,17 @@ COPY . .
CMD ["lmdeploy", "serve", "api_server", "liuhaotian/llava-v1.6-34b"]
```

### 方式三:部署到Kubernetes集群

使用[kubectl](https://kubernetes.io/docs/reference/kubectl/)命令行工具,连接到一个运行中Kubernetes集群并部署internlm2-chat-7b模型服务。下面是使用示例(需要替换`<your token>`为你的huggingface hub token):

```shell
sed 's/{{HUGGING_FACE_HUB_TOKEN}}/<your token>/' k8s/deployment.yaml | kubectl create -f - \
&& kubectl create -f k8s/service.yaml
```

示例中模型数据来源于node上的本地磁盘(hostPath),多副本部署时考虑替换为高可用共享存储,通过[PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)方式挂载到容器中。

## RESTful API

LMDeploy 的 RESTful API 兼容了 OpenAI 以下 3 个接口:
Expand Down
72 changes: 72 additions & 0 deletions k8s/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: internlm2-chat-7b
name: internlm2-chat-7b
spec:
replicas: 1
selector:
matchLabels:
app: internlm2-chat-7b
strategy: {}
template:
metadata:
labels:
app: internlm2-chat-7b
spec:
containers:
- name: internlm2-chat-7b
image: openmmlab/lmdeploy:latest
command:
- /bin/sh
- -c
args:
- "lmdeploy serve api_server internlm/internlm2-chat-7b --server-port 23333"
env:
- name: NCCL_LAUNCH_MODE
value: GROUP
- name: HUGGING_FACE_HUB_TOKEN
value: "{{HUGGING_FACE_HUB_TOKEN}}"
ports:
- containerPort: 23333
protocol: TCP
name: main
resources:
limits:
cpu: "16"
memory: 64Gi
nvidia.com/gpu: "1"
requests:
cpu: "16"
memory: 64Gi
nvidia.com/gpu: "1"
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 400
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: main
timeoutSeconds: 1
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 900
periodSeconds: 20
successThreshold: 1
tcpSocket:
port: main
timeoutSeconds: 1
volumeMounts:
- mountPath: /root/.cache/huggingface
name: model-data
- mountPath: /dev/shm
name: dshm
volumes:
- name: model-data
hostPath:
path: /root/.cache/huggingface
type: DirectoryOrCreate
- emptyDir:
medium: Memory
name: dshm
15 changes: 15 additions & 0 deletions k8s/service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
labels:
app: internlm2-chat-7b
name: internlm2-chat-7b-svc
spec:
ports:
- name: main
port: 23333
protocol: TCP
targetPort: main
selector:
app: internlm2-chat-7b
type: ClusterIP

0 comments on commit 499b75b

Please sign in to comment.