Distributed inference GPU not being used and wrong memory value reported #11272
-
I am trying to do distributed Inference on two older computers, but the backend pc is only reporting Vram (11330MB) for one of the two gpu in that server. Nvidia-smi and NVCC work without errors. The command I use to start the rpc server is The command I use to start the main server is “./llama-cli -m /mnt/database/ds25/DeepSeek-V2.5-1210-Q6_K-00001-of-00005.gguf -p "Hello, my name is" -ngl 24 --rpc 192.168.1.46:50052” NVTOP shows all the gpus on the main server active but only one gpu being used on the backend pc. Is there any settings I can use to fix this? Main server: Dell Poweredge C4130 - 4 X M40 (24GB) I also noticed the active gpu are only using their memory and the GPU% for them is stuck at zero. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
You need to start a separate
Then on the main host:
|
Beta Was this translation helpful? Give feedback.
-
Thank you I am now utilizing the gpu over RPC, one more question. How can i squeeze a bit more power from these cards. The M40-24GB is only using 12213 MIB, and the M40-12GB is using ~ 7106MIB? print_info: max token length = 256 |
Beta Was this translation helpful? Give feedback.
-
I am not sure why the models I am loading are not using more of my GPU, Deepseek and llama3.3 hovers at 50%, what variable could I use to increase the GPU% |
Beta Was this translation helpful? Give feedback.
You need to start a separate
rpc-server
for each GPU that you have:Then on the main host: