-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If dp_size = tp_size is still required for deepseek model? #3359
Comments
Well. I think still need this. But we've automiatically done this. |
cc @zhaochenyang20. Does it mean dp or tp is supported, and i can't use it in same time, eg tp=4, dp=2. We need this feature to deploy deepseek-v3 across multiple nodes. TP in same node, dp cross all nodes. |
The dp in data parallelism attention has different meaning. Check this: |
Gocha! We are planning to apply this deployment architecture to reduce network pressure between nodes, while not increasing inference time too much when throughput is low. |
@luzengxiangcn Cool. Nice work. Hope to see it soon. |
According to deepseek optimization, the deepseek model with dp+tp attention is now supported. If dp_size = tp_size is still required?
In server argument doc:
The text was updated successfully, but these errors were encountered: