-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why TF Serving GPU using GPU Memory very much? #1929
Comments
I have also been facing this issue while deploying models with tf serving. The gpu usage keeps increasing with increasing number of inference requests. And the used memory is not released after the request is over leading to the OOM error very soon when there are multiple models deployed on same gpu. Though this behavior is there with pure tensorflow-based (non TF-serving) deployment, I was expecting it would not be the case with TF Serving since it has been designed for optimized production deployment. The link1, link2 were not of much help. Enabling gpu_growth (using Is it possible to somehow release the unused gpu memory once an inference request is complete or some other way to tackle this issue? |
Any updates on this? |
The issue of not releasing unused memory is inherited from TF and currently there is no workaround. We will discuss it internally to see if we could fix it at least for TF serving. |
Thank you for response! I will waiting team resolve it. |
So, whatever additional memory pool keeps getting occupied by the TF serving, is that shared across all the models deployed in it? i.e. if I have a TF and a non-TF model deployed with TF-serving, will that memory pool be shared between both of them? |
Underneath the hood, TensorFlow Serving uses the TensorFlow runtime to do the actual inference on your requests. And by default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. To mitigate this, currently we have a flag |
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you. |
This issue was closed due to lack of activity after being marked stale for past 7 days. |
Bug Report
TFS using GPU Memory very much!
System information
Describe the problem
I try install and run TFS with sample follow guide: https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/docker.md
After install done, I run TFS with sample:
docker run --gpus all -p 8501:8501
--mount type=bind,
source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,
target=/models/half_plus_two
-e MODEL_NAME=half_plus_two -t tensorflow/serving:latest-gpu &
but GPU Memory Usage 22553MiB/24265MiB.
I don't know reason about this.
Please tell me about it?
Source code / logs
Thank you!
The text was updated successfully, but these errors were encountered: