deepseek-R1 AssertionError occurred in the batch request of the client #3477

Roysky · 2025-02-11T06:26:55Z

While using deepseek-R1 for inference on 2 nodes * 8 GPUs (H800), an AssertionError occurred during the client batch request.

The specific error is as follows:

[2025-02-11 01:42:04] INFO: 10.81.10.40:51432 - "GET /v1/batches/batch_2c036fce-9c71-4d76-9fdb-4701d9f59861 HTTP/1.1" 200 OK
[2025-02-11 01:42:04] DetokenizerManager hit an exception: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/detokenizer_manager.py", line 240, in run_detokenizer_process
manager.event_loop()
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/detokenizer_manager.py", line 143, in event_loop
self.trim_matched_stop(
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/detokenizer_manager.py", line 105, in trim_matched_stop
assert len(output) > 0
AssertionError
[2025-02-11 01:42:04] Received sigquit from a child proces. It usually means the child failed.

The environment configuration is as follows:

sglang version： 0.4.2.post3
env： 2 nodes * H800(8gpus)

Startup command：

node1

python -m sglang.launch_server --model-path DeepSeek-R1 --tp 16 --nccl-init-addr 10.1.10.42:5000 --nnodes 2 --node-rank 0 --trust-remote-code --host 0.0.0.0

node2

python -m sglang.launch_server --model-path DeepSeek-R1 --tp 16 --nccl-init-addr 10.1.10.42:5000 --nnodes 2 --node-rank 1 --trust-remote-code --host 0.0.0.0

The text was updated successfully, but these errors were encountered:

zhaochenyang20 · 2025-02-11T08:21:21Z

Oh. Currently do not use batch in dpsk models. We find this problem. Batch can be easily changed by chat completions.

zhaochenyang20 · 2025-02-11T08:21:42Z

https://docs.sglang.ai/backend/openai_api_completions.html

@Roysky

tanconghui · 2025-02-14T02:16:48Z

Do you have a plan to fix this issue? we need batch API in our scenario.

Oh. Currently do not use batch in dpsk models. We find this problem. Batch can be easily changed by chat completions.

zhaochenyang20 · 2025-02-14T16:50:22Z

Yeah. As that have been said, @FrankLeeeee is on this, dpsk model's batch. Wait and see, thanks!

FrankLeeeee · 2025-02-21T03:43:50Z

@tanconghui @Roysky do you still encounter this issue with the latest release? i cannot reproduce the error. If you can provide me with a script to reproduce the error, that will help as well.

FrankLeeeee · 2025-02-21T09:26:05Z

@tanconghui @Roysky you can take a look at #3754 , I didn't encounter the error any more with this fix.

tanconghui · 2025-02-21T11:40:20Z

Thanks, FrankLeeeee. I also noticed this issue. But maybe it is better use a UUUID stead of the custom_id as the request id? For example, if two batches are processing in the same time, and samples with same custom_id exsit in these two batches, the current solution #3754 still seems problematic.

@tanconghui @Roysky you can take a look at #3754 , I didn't encounter the error any more with this fix.

zhaochenyang20 · 2025-02-21T18:28:34Z

@tanconghui We just merged this into main. Thanks!

zhaochenyang20 self-assigned this Feb 11, 2025

zhaochenyang20 added the deepseek label Feb 11, 2025

FrankLeeeee mentioned this issue Feb 21, 2025

[bug] fixed batch api #3754

Merged

6 tasks

zhaochenyang20 closed this as completed Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepseek-R1 AssertionError occurred in the batch request of the client #3477

deepseek-R1 AssertionError occurred in the batch request of the client #3477

Roysky commented Feb 11, 2025

zhaochenyang20 commented Feb 11, 2025

zhaochenyang20 commented Feb 11, 2025

tanconghui commented Feb 14, 2025

zhaochenyang20 commented Feb 14, 2025

FrankLeeeee commented Feb 21, 2025

FrankLeeeee commented Feb 21, 2025

tanconghui commented Feb 21, 2025

zhaochenyang20 commented Feb 21, 2025

deepseek-R1 AssertionError occurred in the batch request of the client #3477

deepseek-R1 AssertionError occurred in the batch request of the client #3477

Comments

Roysky commented Feb 11, 2025

node1

node2

zhaochenyang20 commented Feb 11, 2025

zhaochenyang20 commented Feb 11, 2025

tanconghui commented Feb 14, 2025

zhaochenyang20 commented Feb 14, 2025

FrankLeeeee commented Feb 21, 2025

FrankLeeeee commented Feb 21, 2025

tanconghui commented Feb 21, 2025

zhaochenyang20 commented Feb 21, 2025