Replies: 1 comment 1 reply
-
One strategy might be for general testing use the CPU backend to ensure nothing goes to VRAM. This could accelerate the workflow to iron out any application-related issues. Then once those are figured out, moving to the GPU backend. Here are a couple other possibilities (from Claude—your mileage may vary 😉):
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm developing an application using LLamaSharp library, it is a .NET binding for llama.cpp. Usually, the model will clear from the memory after disposing LLamaWeights and LLamaContext, but one time I stopped the application during debugging before disposing the instances of these class, the VRAM is still full. I have to restart my machine just to reclaim the memory. I'm sure this will happen a few more times during development. Also, even though I mitigated this issue when it crashed by disposing the instances when it encounter an error, there are chances that this may still happen and can't be caught.
Not sure if this will matter, but I'm on CachyOS and using llama.cpp-vulkan. I got an iGPU of RX 780M and RX 9070 as an eGPU.
Beta Was this translation helpful? Give feedback.
All reactions