-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not load Deepseek model in two nodes #3467
Comments
try --tp 16 and set --dist-init-addr the same value for both nodes. |
And I get problem:
|
First, I don't recommand run 70B model with two nodes. Later, try this: |
Also, the problem is due to |
add environments GLOO_SOCKET_IFNAME=your_nic_here NCCL_SOCKET_IFNAME=your_nic_here |
|
I set GLOO_SOCKET_IFNAME and NCCL_SOCKET_IFNAME, and code is stuck:
My code is :
I also use export And My network card is here:
|
@Tian14267 python3 -m sglang.check_env |
Hello,
I run DeepSeek-R1-Distill-Llama-70B in two machines, but it will stuck in Init.
Here is my codes:
run_2_nodes.sh
It will stuck in Here:
The text was updated successfully, but these errors were encountered: