Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory leak with high load for ONNX model #198

Open
junwang-wish opened this issue Jun 14, 2023 · 3 comments
Open

GPU memory leak with high load for ONNX model #198

junwang-wish opened this issue Jun 14, 2023 · 3 comments

Comments

@junwang-wish
Copy link

Description
GPU memory leak with high load, GPU memory usage goes up and never come down when high load requests stop coming (memory never released)

Triton Information
What version of Triton are you using?

23.02

Are you using the Triton container or did you build it yourself?

Triton container

To Reproduce
Any ONNX model under high load would result in monotonically increasing GPU memory usage

Expected behavior
When requests stop coming, GPU memory should be released

@kthui
Copy link
Contributor

kthui commented Jun 20, 2023

I wonder if the memory usage would come down if the model is unloaded (i.e. via the unload API).

cc @tanmayv25 if the memory usage is expected.

@junwang-wish
Copy link
Author

Thx @kthui , so I don't want to unload the model, since it is used but on a unfrequent basis, ideally if a model is unused for a prolonged period of time (say 2 hours) the GPU memory would be freed

@tanmayv25
Copy link
Contributor

@junwang-wish which execution provider in ORT are you using? Are you using TRT or CUDA?
Is your model having dynamic shaped inputs?
I am transferring the issue to ORT backend team as it seems to be an issue with ORT.

@tanmayv25 tanmayv25 transferred this issue from triton-inference-server/server Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants