Can not load Deepseek model in two nodes #3467

Tian14267 · 2025-02-10T11:26:57Z

Hello,
I run DeepSeek-R1-Distill-Llama-70B in two machines， but it will stuck in Init.

Here is my codes:
run_2_nodes.sh

# node 1
/data/miniconda3/envs/fffan_sglang/bin/python \
            -m sglang.launch_server \
            --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
            --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 0 --trust-remote-code \
            --host 0.0.0.0 --port 3000

# node 2
/data/miniconda3/envs/fffan_sglang/bin/python \
      -m sglang.launch_server \
      --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
      --tp 8 --dist-init-addr 10.0.1.104:5000 --nnodes 2 --node-rank 1 --trust-remote-code

It will stuck in Here:

INFO 02-10 11:18:47 __init__.py:190] Automatically detected platform cuda.
[2025-02-10 11:18:51] server_args=ServerArgs(model_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='0.0.0.0', port=3000, mem_fraction_static=0.81, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=454553257, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr='10.68.27.12:5000', nnodes=2, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False)
INFO 02-10 11:18:54 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 11:18:54 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 11:18:54 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 11:18:54 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 11:18:54 __init__.py:190] Automatically detected platform cuda.
[2025-02-10 11:18:58 TP1] Init torch distributed begin.
[2025-02-10 11:18:58 TP3] Init torch distributed begin.
[2025-02-10 11:18:58 TP0] Init torch distributed begin.
[2025-02-10 11:18:59 TP2] Init torch distributed begin.

The text was updated successfully, but these errors were encountered:

wangdaw2023 · 2025-02-10T11:32:51Z

try --tp 16 and set --dist-init-addr the same value for both nodes.
Except --node-rank 0, other parameters should be the same.

Tian14267 · 2025-02-10T12:21:03Z

try --tp 16 and set --dist-init-addr the same value for both nodes. Except --node-rank 0, other parameters should be the same.
@wangdaw2023
Thank you very much! I have 4 GPU in each nodes。
I update my code:

/data/miniconda3/envs/fffan_sglang/bin/python \
            -m sglang.launch_server \
            --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
            --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 0 --trust-remote-code \
            --host 0.0.0.0 --port 3000

# node 2
/data/miniconda3/envs/fffan_sglang/bin/python \
      -m sglang.launch_server \
      --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
      --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 1 --trust-remote-code

And I get problem:

[2025-02-10 12:17:35] server_args=ServerArgs(model_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='0.0.0.0', port=3000, mem_fraction_static=0.81, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=642830900, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr='10.68.27.12:5000', nnodes=2, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False)
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
[2025-02-10 12:17:43 TP2] Init torch distributed begin.
[2025-02-10 12:17:43 TP0] Init torch distributed begin.
[2025-02-10 12:17:43 TP1] Init torch distributed begin.
[2025-02-10 12:17:43 TP3] Init torch distributed begin.
[rank1]:[E210 12:19:37.109537099 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[rank2]:[E210 12:19:37.109537025 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[rank3]:[E210 12:19:37.109542684 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[2025-02-10 12:19:37 TP2] Scheduler hit an exception: Traceback (most recent call last):
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1787, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 240, in __init__
    self.tp_worker = TpWorkerClass(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 68, in __init__
    self.model_runner = ModelRunner(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 178, in __init__
    min_per_gpu_memory = self.init_torch_distributed()
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 246, in init_torch_distributed
    init_distributed_environment(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 997, in init_distributed_environment
    _WORLD = init_world_group(ranks, local_rank, backend)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 868, in init_world_group
    return GroupCoordinator(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 202, in __init__
    cpu_group = torch.distributed.new_group(ranks, backend="gloo")
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 97, in wrapper
    func_return = func(*args, **kwargs)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 4565, in new_group
    return _new_group_with_tag(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 4648, in _new_group_with_tag
    pg, pg_store = _new_process_group_helper(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1744, in _new_process_group_helper
    backend_class = ProcessGroupGloo(
RuntimeError: Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error

zhaochenyang20 · 2025-02-10T16:36:35Z

First, I don't recommand run 70B model with two nodes. Later, try this:

https://docs.sglang.ai/references/multi_node.html

zhaochenyang20 · 2025-02-10T16:37:55Z

Also, the problem is due to gloo. Could you try test your two nodes' connectivity, just to check the gloo.

OneThingAI · 2025-02-11T03:54:40Z

try --tp 16 and set --dist-init-addr the same value for both nodes. Except --node-rank 0, other parameters should be the same.
@wangdaw2023
Thank you very much! I have 4 GPU in each nodes。
I update my code:

/data/miniconda3/envs/fffan_sglang/bin/python \
            -m sglang.launch_server \
            --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
            --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 0 --trust-remote-code \
            --host 0.0.0.0 --port 3000

# node 2
/data/miniconda3/envs/fffan_sglang/bin/python \
      -m sglang.launch_server \
      --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
      --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 1 --trust-remote-code

And I get problem:

[2025-02-10 12:17:35] server_args=ServerArgs(model_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='0.0.0.0', port=3000, mem_fraction_static=0.81, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=642830900, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr='10.68.27.12:5000', nnodes=2, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False)
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
[2025-02-10 12:17:43 TP2] Init torch distributed begin.
[2025-02-10 12:17:43 TP0] Init torch distributed begin.
[2025-02-10 12:17:43 TP1] Init torch distributed begin.
[2025-02-10 12:17:43 TP3] Init torch distributed begin.
[rank1]:[E210 12:19:37.109537099 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[rank2]:[E210 12:19:37.109537025 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[rank3]:[E210 12:19:37.109542684 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[2025-02-10 12:19:37 TP2] Scheduler hit an exception: Traceback (most recent call last):
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1787, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 240, in __init__
    self.tp_worker = TpWorkerClass(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 68, in __init__
    self.model_runner = ModelRunner(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 178, in __init__
    min_per_gpu_memory = self.init_torch_distributed()
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 246, in init_torch_distributed
    init_distributed_environment(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 997, in init_distributed_environment
    _WORLD = init_world_group(ranks, local_rank, backend)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 868, in init_world_group
    return GroupCoordinator(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 202, in __init__
    cpu_group = torch.distributed.new_group(ranks, backend="gloo")
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 97, in wrapper
    func_return = func(*args, **kwargs)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 4565, in new_group
    return _new_group_with_tag(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 4648, in _new_group_with_tag
    pg, pg_store = _new_process_group_helper(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1744, in _new_process_group_helper
    backend_class = ProcessGroupGloo(
RuntimeError: Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error

add environments GLOO_SOCKET_IFNAME=your_nic_here NCCL_SOCKET_IFNAME=your_nic_here

Tian14267 · 2025-02-11T08:40:06Z

try --tp 16 and set --dist-init-addr the same value for both nodes. Except --node-rank 0, other parameters should be the same.
@wangdaw2023
Thank you very much! I have 4 GPU in each nodes。
I update my code:

/data/miniconda3/envs/fffan_sglang/bin/python \
            -m sglang.launch_server \
            --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
            --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 0 --trust-remote-code \
            --host 0.0.0.0 --port 3000

# node 2
/data/miniconda3/envs/fffan_sglang/bin/python \
      -m sglang.launch_server \
      --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
      --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 1 --trust-remote-code

And I get problem:

[2025-02-10 12:17:35] server_args=ServerArgs(model_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='0.0.0.0', port=3000, mem_fraction_static=0.81, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=642830900, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr='10.68.27.12:5000', nnodes=2, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False)
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
[2025-02-10 12:17:43 TP2] Init torch distributed begin.
[2025-02-10 12:17:43 TP0] Init torch distributed begin.
[2025-02-10 12:17:43 TP1] Init torch distributed begin.
[2025-02-10 12:17:43 TP3] Init torch distributed begin.
[rank1]:[E210 12:19:37.109537099 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[rank2]:[E210 12:19:37.109537025 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[rank3]:[E210 12:19:37.109542684 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[2025-02-10 12:19:37 TP2] Scheduler hit an exception: Traceback (most recent call last):
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1787, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 240, in __init__
    self.tp_worker = TpWorkerClass(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 68, in __init__
    self.model_runner = ModelRunner(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 178, in __init__
    min_per_gpu_memory = self.init_torch_distributed()
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 246, in init_torch_distributed
    init_distributed_environment(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 997, in init_distributed_environment
    _WORLD = init_world_group(ranks, local_rank, backend)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 868, in init_world_group
    return GroupCoordinator(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 202, in __init__
    cpu_group = torch.distributed.new_group(ranks, backend="gloo")
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 97, in wrapper
    func_return = func(*args, **kwargs)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 4565, in new_group
    return _new_group_with_tag(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 4648, in _new_group_with_tag
    pg, pg_store = _new_process_group_helper(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1744, in _new_process_group_helper
    backend_class = ProcessGroupGloo(
RuntimeError: Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error

add environments GLOO_SOCKET_IFNAME=your_nic_here NCCL_SOCKET_IFNAME=your_nic_here

Tian14267 · 2025-02-11T08:44:21Z

I set GLOO_SOCKET_IFNAME and NCCL_SOCKET_IFNAME, and code is stuck:

INFO 02-11 08:38:18 __init__.py:190] Automatically detected platform cuda.
[2025-02-11 08:38:22] server_args=ServerArgs(model_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='0.0.0.0', port=3000, mem_fraction_static=0.81, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=210968577, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr='10.68.27.12:5000', nnodes=2, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False)
INFO 02-11 08:38:24 __init__.py:190] Automatically detected platform cuda.
INFO 02-11 08:38:24 __init__.py:190] Automatically detected platform cuda.
INFO 02-11 08:38:25 __init__.py:190] Automatically detected platform cuda.
INFO 02-11 08:38:25 __init__.py:190] Automatically detected platform cuda.
INFO 02-11 08:38:25 __init__.py:190] Automatically detected platform cuda.
[2025-02-11 08:38:29 TP0] Init torch distributed begin.
[2025-02-11 08:38:29 TP1] Init torch distributed begin.
[2025-02-11 08:38:29 TP2] Init torch distributed begin.
[2025-02-11 08:38:29 TP3] Init torch distributed begin.

My code is :

# 所有机器上统一设置
export NCCL_SOCKET_IFNAME=bond0         # 强制NCCL使用以太网卡
export GLOO_SOCKET_IFNAME=bond0         # 强制Gloo使用以太网卡
# 启用NCCL调试输出
export NCCL_DEBUG=INFO
# 启用Gloo调试输出（PyTorch）
export GLOO_DEBUG=1

# node 1
export CUDA_VISIBLE_DEVICES=4,5,6,7
/data/miniconda3/envs/fffan_sglang/bin/python \
            -m sglang.launch_server \
            --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
            --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 0 --trust-remote-code \
            --host 0.0.0.0 --port 3000

# node 2
/data/miniconda3/envs/fffan_sglang/bin/python \
      -m sglang.launch_server \
      --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
      --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 1 --trust-remote-code

I also use export NCCL_SOCKET_IFNAME=ens8f0np0

And My network card is here:

(base) root@ubuntu:/data/fffan# ip addr 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens8f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 56:d5:e5:5f:15:e5 brd ff:ff:ff:ff:ff:ff permaddr 6c:92:cf:af:66:20
    altname enp151s0f0np0
3: ens8f1np1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 56:d5:e5:5f:15:e5 brd ff:ff:ff:ff:ff:ff permaddr 6c:92:cf:af:66:21
    altname enp151s0f1np1
4: ens16f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 6c:fe:54:a1:3b:6c brd ff:ff:ff:ff:ff:ff
    altname enp50s0f0
5: ens16f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 6c:fe:54:a1:3b:6d brd ff:ff:ff:ff:ff:ff
    altname enp50s0f1
6: ens16f2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 6c:fe:54:a1:3b:6e brd ff:ff:ff:ff:ff:ff
    altname enp50s0f2
7: ens16f3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 6c:fe:54:a1:3b:6f brd ff:ff:ff:ff:ff:ff
    altname enp50s0f3
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 56:d5:e5:5f:15:e5 brd ff:ff:ff:ff:ff:ff
    inet 10.68.27.12/24 brd 10.68.27.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::54d5:e5ff:fe5f:15e5/64 scope link 
       valid_lft forever preferred_lft forever
9: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:1e:81:f5:fe brd ff:ff:ff:ff:ff:ff
    inet 172.250.0.1/20 brd 172.250.15.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:1eff:fe81:f5fe/64 scope link 
       valid_lft forever preferred_lft forever
42: veth3766c09@if41: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether da:39:13:b5:45:e6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::d839:13ff:feb5:45e6/64 scope link 
       valid_lft forever preferred_lft forever

zhyncs · 2025-02-11T08:47:32Z

@Tian14267 python3 -m sglang.check_env

zhaochenyang20 self-assigned this Feb 10, 2025

Tian14267 closed this as completed Feb 11, 2025

Tian14267 reopened this Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not load Deepseek model in two nodes #3467

Can not load Deepseek model in two nodes #3467

Tian14267 commented Feb 10, 2025 •

edited

Loading

wangdaw2023 commented Feb 10, 2025 •

edited

Loading

Tian14267 commented Feb 10, 2025 •

edited

Loading

zhaochenyang20 commented Feb 10, 2025

zhaochenyang20 commented Feb 10, 2025

OneThingAI commented Feb 11, 2025

Tian14267 commented Feb 11, 2025

Tian14267 commented Feb 11, 2025

zhyncs commented Feb 11, 2025

Can not load Deepseek model in two nodes #3467

Can not load Deepseek model in two nodes #3467

Comments

Tian14267 commented Feb 10, 2025 • edited Loading

wangdaw2023 commented Feb 10, 2025 • edited Loading

Tian14267 commented Feb 10, 2025 • edited Loading

zhaochenyang20 commented Feb 10, 2025

zhaochenyang20 commented Feb 10, 2025

OneThingAI commented Feb 11, 2025

Tian14267 commented Feb 11, 2025

Tian14267 commented Feb 11, 2025

zhyncs commented Feb 11, 2025

Tian14267 commented Feb 10, 2025 •

edited

Loading

wangdaw2023 commented Feb 10, 2025 •

edited

Loading

Tian14267 commented Feb 10, 2025 •

edited

Loading