Skip to content

Distributed inference GPU not being used and wrong memory value reported #11272

Answered by rgerganov
gngglobetech asked this question in Q&A
Discussion options

You must be logged in to vote

You need to start a separate rpc-server for each GPU that you have:

$ CUDA_VISIBLE_DEVICES=0 bin/rpc-server -H 0.0.0.0 -p 50052
$ CUDA_VISIBLE_DEVICES=1 bin/rpc-server -H 0.0.0.0 -p 50053

Then on the main host:

$ llama-cli <other-params> --rpc <rpc-host>:50052 --rpc <rpc-host>:50053

Replies: 4 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Answer selected by gngglobetech
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@slaren
Comment options

slaren Jan 17, 2025
Collaborator

@gngglobetech
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants