Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i use 3 A800 to deploy deepseek r1,but one A800 just one IB,how i adjust the number of tp in the deploy command #3517

Closed
19157681683 opened this issue Feb 12, 2025 · 1 comment
Assignees
Labels

Comments

@19157681683
Copy link

node 1

export NCCL_IB_HCA=mlx5_0
python3 -m sglang.launch_server --model-path /x32001214/model/bf16/DeepSeek-R1-BF16 --tp 12 --dist-init-addr 0.0.0.0:9997 --nnodes 3 --node-rank 0 --trust-remote-code --host 0.0.0.0 --port 8888

node 2

export NCCL_IB_HCA=mlx5_1
python3 -m sglang.launch_server --model-path /x32001214/model/bf16/DeepSeek-R1-BF16 --tp 24 --dist-init-addr 10.160.199.103:30172 --nnodes 3 --node-rank 1 --trust-remote-code

node 3

export NCCL_IB_HCA=mlx5_1
python3 -m sglang.launch_server --model-path /x32001214/model/bf16/DeepSeek-R1-BF16 --tp 24 --dist-init-addr 10.160.199.103:30172 --nnodes 3 --node-rank 2 --trust-remote-code

@19157681683 19157681683 changed the title i use 3 A800 to deploy deepseek r1,but one A800 just a IB,how i adjust the number of tp in the deploy command i use 3 A800 to deploy deepseek r1,but one A800 just one IB,how i adjust the number of tp in the deploy command Feb 12, 2025
@jhinpan
Copy link
Collaborator

jhinpan commented Feb 12, 2025

You can definitely give a try for your current deploy command:

  • For the node with one IB adapter:
    • Use a lower tensor parallelism value. In your case, set --tp 12 and configure it to use its available IB (e.g., export NCCL_IB_HCA=mlx5_0).
  • For the nodes with two IB adapters:
    • Use a higher tensor parallelism value since they can support more communication bandwidth. In your case, set --tp 24 and assign the proper IB (e.g., export NCCL_IB_HCA=mlx5_1).

If this doesn't work or you prefer a uniform configuration, force all nodes to use the same IB adapter (for example, set NCCL_IB_HCA=mlx5_0 on every node) and then use the same --tp (e.g., --tp 12) across all nodes. This may simplify deployment, but it might not fully leverage the extra IB capacity on the nodes that have two adapters.

@jhinpan jhinpan self-assigned this Feb 12, 2025
@jhinpan jhinpan closed this as completed Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants