Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If dp_size = tp_size is still required for deepseek model? #3359

Closed
luzengxiangcn opened this issue Feb 7, 2025 · 5 comments
Closed

If dp_size = tp_size is still required for deepseek model? #3359

luzengxiangcn opened this issue Feb 7, 2025 · 5 comments

Comments

@luzengxiangcn
Copy link
Contributor

According to deepseek optimization, the deepseek model with dp+tp attention is now supported. If dp_size = tp_size is still required?

In server argument doc:

enable_dp_attention: Enable [Data Parallelism Attention](https://lmsys.org/blog/2024-12-04-sglang-v0-4/#data-parallelism-attention-for-deepseek-models) for Deepseek models. 
Note that you need to choose dp_size = tp_size for this.
@zhaochenyang20
Copy link
Collaborator

Well. I think still need this. But we've automiatically done this.

@luzengxiangcn
Copy link
Contributor Author

cc @zhaochenyang20. Does it mean dp or tp is supported, and i can't use it in same time, eg tp=4, dp=2. We need this feature to deploy deepseek-v3 across multiple nodes. TP in same node, dp cross all nodes.

@zhaochenyang20
Copy link
Collaborator

The dp in data parallelism attention has different meaning. Check this:

https://docs.sglang.ai/references/deepseek.html#multi-head-latent-attention-mla-throughput-optimizations

@luzengxiangcn
Copy link
Contributor Author

luzengxiangcn commented Feb 7, 2025

@zhaochenyang20

The dp in data parallelism attention has different meaning. Check this:

https://docs.sglang.ai/references/deepseek.html#multi-head-latent-attention-mla-throughput-optimizations

Gocha!
What I am looking for is:

Image

We are planning to apply this deployment architecture to reduce network pressure between nodes, while not increasing inference time too much when throughput is low.

@zhaochenyang20
Copy link
Collaborator

@luzengxiangcn Cool. Nice work. Hope to see it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants