Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not load Deepseek model in two nodes #3467

Open
Tian14267 opened this issue Feb 10, 2025 · 8 comments
Open

Can not load Deepseek model in two nodes #3467

Tian14267 opened this issue Feb 10, 2025 · 8 comments
Assignees

Comments

@Tian14267
Copy link

Tian14267 commented Feb 10, 2025

Hello,
I run DeepSeek-R1-Distill-Llama-70B in two machines, but it will stuck in Init.

Here is my codes:
run_2_nodes.sh

# node 1
/data/miniconda3/envs/fffan_sglang/bin/python \
            -m sglang.launch_server \
            --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
            --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 0 --trust-remote-code \
            --host 0.0.0.0 --port 3000

# node 2
/data/miniconda3/envs/fffan_sglang/bin/python \
      -m sglang.launch_server \
      --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
      --tp 8 --dist-init-addr 10.0.1.104:5000 --nnodes 2 --node-rank 1 --trust-remote-code

It will stuck in Here:

INFO 02-10 11:18:47 __init__.py:190] Automatically detected platform cuda.
[2025-02-10 11:18:51] server_args=ServerArgs(model_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='0.0.0.0', port=3000, mem_fraction_static=0.81, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=454553257, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr='10.68.27.12:5000', nnodes=2, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False)
INFO 02-10 11:18:54 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 11:18:54 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 11:18:54 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 11:18:54 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 11:18:54 __init__.py:190] Automatically detected platform cuda.
[2025-02-10 11:18:58 TP1] Init torch distributed begin.
[2025-02-10 11:18:58 TP3] Init torch distributed begin.
[2025-02-10 11:18:58 TP0] Init torch distributed begin.
[2025-02-10 11:18:59 TP2] Init torch distributed begin.



@wangdaw2023
Copy link

wangdaw2023 commented Feb 10, 2025

try --tp 16 and set --dist-init-addr the same value for both nodes.
Except --node-rank 0, other parameters should be the same.

@Tian14267
Copy link
Author

Tian14267 commented Feb 10, 2025

try --tp 16 and set --dist-init-addr the same value for both nodes. Except --node-rank 0, other parameters should be the same.
@wangdaw2023
Thank you very much! I have 4 GPU in each nodes。
I update my code:

/data/miniconda3/envs/fffan_sglang/bin/python \
            -m sglang.launch_server \
            --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
            --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 0 --trust-remote-code \
            --host 0.0.0.0 --port 3000

# node 2
/data/miniconda3/envs/fffan_sglang/bin/python \
      -m sglang.launch_server \
      --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
      --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 1 --trust-remote-code

And I get problem:

[2025-02-10 12:17:35] server_args=ServerArgs(model_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='0.0.0.0', port=3000, mem_fraction_static=0.81, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=642830900, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr='10.68.27.12:5000', nnodes=2, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False)
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
[2025-02-10 12:17:43 TP2] Init torch distributed begin.
[2025-02-10 12:17:43 TP0] Init torch distributed begin.
[2025-02-10 12:17:43 TP1] Init torch distributed begin.
[2025-02-10 12:17:43 TP3] Init torch distributed begin.
[rank1]:[E210 12:19:37.109537099 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[rank2]:[E210 12:19:37.109537025 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[rank3]:[E210 12:19:37.109542684 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[2025-02-10 12:19:37 TP2] Scheduler hit an exception: Traceback (most recent call last):
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1787, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 240, in __init__
    self.tp_worker = TpWorkerClass(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 68, in __init__
    self.model_runner = ModelRunner(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 178, in __init__
    min_per_gpu_memory = self.init_torch_distributed()
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 246, in init_torch_distributed
    init_distributed_environment(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 997, in init_distributed_environment
    _WORLD = init_world_group(ranks, local_rank, backend)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 868, in init_world_group
    return GroupCoordinator(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 202, in __init__
    cpu_group = torch.distributed.new_group(ranks, backend="gloo")
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 97, in wrapper
    func_return = func(*args, **kwargs)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 4565, in new_group
    return _new_group_with_tag(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 4648, in _new_group_with_tag
    pg, pg_store = _new_process_group_helper(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1744, in _new_process_group_helper
    backend_class = ProcessGroupGloo(
RuntimeError: Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error

@zhaochenyang20
Copy link
Collaborator

First, I don't recommand run 70B model with two nodes. Later, try this:

https://docs.sglang.ai/references/multi_node.html

@zhaochenyang20
Copy link
Collaborator

Also, the problem is due to gloo. Could you try test your two nodes' connectivity, just to check the gloo.

@zhaochenyang20 zhaochenyang20 self-assigned this Feb 10, 2025
@OneThingAI
Copy link

try --tp 16 and set --dist-init-addr the same value for both nodes. Except --node-rank 0, other parameters should be the same.
@wangdaw2023
Thank you very much! I have 4 GPU in each nodes。
I update my code:

/data/miniconda3/envs/fffan_sglang/bin/python \
            -m sglang.launch_server \
            --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
            --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 0 --trust-remote-code \
            --host 0.0.0.0 --port 3000

# node 2
/data/miniconda3/envs/fffan_sglang/bin/python \
      -m sglang.launch_server \
      --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
      --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 1 --trust-remote-code

And I get problem:

[2025-02-10 12:17:35] server_args=ServerArgs(model_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='0.0.0.0', port=3000, mem_fraction_static=0.81, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=642830900, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr='10.68.27.12:5000', nnodes=2, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False)
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
[2025-02-10 12:17:43 TP2] Init torch distributed begin.
[2025-02-10 12:17:43 TP0] Init torch distributed begin.
[2025-02-10 12:17:43 TP1] Init torch distributed begin.
[2025-02-10 12:17:43 TP3] Init torch distributed begin.
[rank1]:[E210 12:19:37.109537099 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[rank2]:[E210 12:19:37.109537025 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[rank3]:[E210 12:19:37.109542684 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[2025-02-10 12:19:37 TP2] Scheduler hit an exception: Traceback (most recent call last):
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1787, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 240, in __init__
    self.tp_worker = TpWorkerClass(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 68, in __init__
    self.model_runner = ModelRunner(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 178, in __init__
    min_per_gpu_memory = self.init_torch_distributed()
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 246, in init_torch_distributed
    init_distributed_environment(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 997, in init_distributed_environment
    _WORLD = init_world_group(ranks, local_rank, backend)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 868, in init_world_group
    return GroupCoordinator(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 202, in __init__
    cpu_group = torch.distributed.new_group(ranks, backend="gloo")
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 97, in wrapper
    func_return = func(*args, **kwargs)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 4565, in new_group
    return _new_group_with_tag(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 4648, in _new_group_with_tag
    pg, pg_store = _new_process_group_helper(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1744, in _new_process_group_helper
    backend_class = ProcessGroupGloo(
RuntimeError: Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error

add environments GLOO_SOCKET_IFNAME=your_nic_here NCCL_SOCKET_IFNAME=your_nic_here

@Tian14267
Copy link
Author

try --tp 16 and set --dist-init-addr the same value for both nodes. Except --node-rank 0, other parameters should be the same.
@wangdaw2023
Thank you very much! I have 4 GPU in each nodes。
I update my code:

/data/miniconda3/envs/fffan_sglang/bin/python \
            -m sglang.launch_server \
            --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
            --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 0 --trust-remote-code \
            --host 0.0.0.0 --port 3000

# node 2
/data/miniconda3/envs/fffan_sglang/bin/python \
      -m sglang.launch_server \
      --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
      --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 1 --trust-remote-code

And I get problem:

[2025-02-10 12:17:35] server_args=ServerArgs(model_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='0.0.0.0', port=3000, mem_fraction_static=0.81, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=642830900, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr='10.68.27.12:5000', nnodes=2, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False)
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 12:17:38 __init__.py:190] Automatically detected platform cuda.
[2025-02-10 12:17:43 TP2] Init torch distributed begin.
[2025-02-10 12:17:43 TP0] Init torch distributed begin.
[2025-02-10 12:17:43 TP1] Init torch distributed begin.
[2025-02-10 12:17:43 TP3] Init torch distributed begin.
[rank1]:[E210 12:19:37.109537099 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[rank2]:[E210 12:19:37.109537025 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[rank3]:[E210 12:19:37.109542684 ProcessGroupGloo.cpp:143] Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error
[2025-02-10 12:19:37 TP2] Scheduler hit an exception: Traceback (most recent call last):
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1787, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 240, in __init__
    self.tp_worker = TpWorkerClass(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 68, in __init__
    self.model_runner = ModelRunner(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 178, in __init__
    min_per_gpu_memory = self.init_torch_distributed()
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 246, in init_torch_distributed
    init_distributed_environment(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 997, in init_distributed_environment
    _WORLD = init_world_group(ranks, local_rank, backend)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 868, in init_world_group
    return GroupCoordinator(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 202, in __init__
    cpu_group = torch.distributed.new_group(ranks, backend="gloo")
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 97, in wrapper
    func_return = func(*args, **kwargs)
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 4565, in new_group
    return _new_group_with_tag(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 4648, in _new_group_with_tag
    pg, pg_store = _new_process_group_helper(
  File "/data/miniconda3/envs/fffan_sglang/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1744, in _new_process_group_helper
    backend_class = ProcessGroupGloo(
RuntimeError: Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error

add environments GLOO_SOCKET_IFNAME=your_nic_here NCCL_SOCKET_IFNAME=your_nic_here

@Tian14267 Tian14267 reopened this Feb 11, 2025
@Tian14267
Copy link
Author

I set GLOO_SOCKET_IFNAME and NCCL_SOCKET_IFNAME, and code is stuck:

INFO 02-11 08:38:18 __init__.py:190] Automatically detected platform cuda.
[2025-02-11 08:38:22] server_args=ServerArgs(model_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_path='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='/data/fffan/models/DeepSeek-R1-Distill-Llama-70B', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='0.0.0.0', port=3000, mem_fraction_static=0.81, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=210968577, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr='10.68.27.12:5000', nnodes=2, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False)
INFO 02-11 08:38:24 __init__.py:190] Automatically detected platform cuda.
INFO 02-11 08:38:24 __init__.py:190] Automatically detected platform cuda.
INFO 02-11 08:38:25 __init__.py:190] Automatically detected platform cuda.
INFO 02-11 08:38:25 __init__.py:190] Automatically detected platform cuda.
INFO 02-11 08:38:25 __init__.py:190] Automatically detected platform cuda.
[2025-02-11 08:38:29 TP0] Init torch distributed begin.
[2025-02-11 08:38:29 TP1] Init torch distributed begin.
[2025-02-11 08:38:29 TP2] Init torch distributed begin.
[2025-02-11 08:38:29 TP3] Init torch distributed begin.

My code is :

# 所有机器上统一设置
export NCCL_SOCKET_IFNAME=bond0         # 强制NCCL使用以太网卡
export GLOO_SOCKET_IFNAME=bond0         # 强制Gloo使用以太网卡
# 启用NCCL调试输出
export NCCL_DEBUG=INFO
# 启用Gloo调试输出(PyTorch)
export GLOO_DEBUG=1

# node 1
export CUDA_VISIBLE_DEVICES=4,5,6,7
/data/miniconda3/envs/fffan_sglang/bin/python \
            -m sglang.launch_server \
            --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
            --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 0 --trust-remote-code \
            --host 0.0.0.0 --port 3000

# node 2
/data/miniconda3/envs/fffan_sglang/bin/python \
      -m sglang.launch_server \
      --model-path /data/fffan/models/DeepSeek-R1-Distill-Llama-70B \
      --tp 8 --dist-init-addr 10.68.27.12:5000 --nnodes 2 --node-rank 1 --trust-remote-code

I also use export NCCL_SOCKET_IFNAME=ens8f0np0

And My network card is here:

(base) root@ubuntu:/data/fffan# ip addr 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens8f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 56:d5:e5:5f:15:e5 brd ff:ff:ff:ff:ff:ff permaddr 6c:92:cf:af:66:20
    altname enp151s0f0np0
3: ens8f1np1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 56:d5:e5:5f:15:e5 brd ff:ff:ff:ff:ff:ff permaddr 6c:92:cf:af:66:21
    altname enp151s0f1np1
4: ens16f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 6c:fe:54:a1:3b:6c brd ff:ff:ff:ff:ff:ff
    altname enp50s0f0
5: ens16f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 6c:fe:54:a1:3b:6d brd ff:ff:ff:ff:ff:ff
    altname enp50s0f1
6: ens16f2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 6c:fe:54:a1:3b:6e brd ff:ff:ff:ff:ff:ff
    altname enp50s0f2
7: ens16f3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 6c:fe:54:a1:3b:6f brd ff:ff:ff:ff:ff:ff
    altname enp50s0f3
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 56:d5:e5:5f:15:e5 brd ff:ff:ff:ff:ff:ff
    inet 10.68.27.12/24 brd 10.68.27.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::54d5:e5ff:fe5f:15e5/64 scope link 
       valid_lft forever preferred_lft forever
9: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:1e:81:f5:fe brd ff:ff:ff:ff:ff:ff
    inet 172.250.0.1/20 brd 172.250.15.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:1eff:fe81:f5fe/64 scope link 
       valid_lft forever preferred_lft forever
42: veth3766c09@if41: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether da:39:13:b5:45:e6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::d839:13ff:feb5:45e6/64 scope link 
       valid_lft forever preferred_lft forever

@zhyncs
Copy link
Member

zhyncs commented Feb 11, 2025

@Tian14267 python3 -m sglang.check_env

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants