Why TF Serving GPU using GPU Memory very much? #1929

duonghb53 · 2021-10-31T17:41:33Z

Bug Report

TFS using GPU Memory very much!

System information

**OS Platform and Distribution: CentOS 8.4, Cuda 11.4
**TensorFlow Serving installed from: Docker
**TensorFlow Serving version: lastest-gpu

Describe the problem

I try install and run TFS with sample follow guide: https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/docker.md

After install done, I run TFS with sample:
docker run --gpus all -p 8501:8501
--mount type=bind,
source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,
target=/models/half_plus_two
-e MODEL_NAME=half_plus_two -t tensorflow/serving:latest-gpu &

but GPU Memory Usage 22553MiB/24265MiB.
I don't know reason about this.
Please tell me about it?

Source code / logs

Thank you!

pindinagesh · 2021-11-02T10:26:49Z

@duonghb53

Could you please refer similar issues described in link1 and link2, let us know if this helps. thanks

shv07 · 2021-11-08T17:12:44Z

@pindinagesh

I have also been facing this issue while deploying models with tf serving.

The gpu usage keeps increasing with increasing number of inference requests. And the used memory is not released after the request is over leading to the OOM error very soon when there are multiple models deployed on same gpu.

Though this behavior is there with pure tensorflow-based (non TF-serving) deployment, I was expecting it would not be the case with TF Serving since it has been designed for optimized production deployment.

The link1, link2 were not of much help. Enabling gpu_growth (using TF_FORCE_GPU_ALLOW_GROWTH=true) would lead to low gpu usage initially but finally end up with same issue as the no. of requests increase. And restricting gpu usage to a certain fraction will also not be a feasible approach where same gpu is used for deploying multiple models.

Is it possible to somehow release the unused gpu memory once an inference request is complete or some other way to tackle this issue?

shv07 · 2021-11-16T11:11:44Z

Any updates on this?

guanxinq · 2021-11-19T18:04:31Z

The issue of not releasing unused memory is inherited from TF and currently there is no workaround. We will discuss it internally to see if we could fix it at least for TF serving.

duonghb53 · 2021-11-20T06:05:05Z

The issue of not releasing unused memory is inherited from TF and currently there is no workaround. We will discuss it internally to see if we could fix it at least for TF serving.

Thank you for response! I will waiting team resolve it.

shv07 · 2021-11-22T05:29:43Z

The issue of not releasing unused memory is inherited from TF and currently there is no workaround. We will discuss it internally to see if we could fix it at least for TF serving.

So, whatever additional memory pool keeps getting occupied by the TF serving, is that shared across all the models deployed in it? i.e. if I have a TF and a non-TF model deployed with TF-serving, will that memory pool be shared between both of them?

singhniraj08 · 2023-04-27T06:00:47Z

@duonghb53,

Underneath the hood, TensorFlow Serving uses the TensorFlow runtime to do the actual inference on your requests. And by default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process.

To mitigate this, currently we have a flag "per_process_gpu_memory_fraction" in command-line flags to define a fraction that each process occupies of the GPU memory space. Kindly let us know if this flag settles your high GPU usage.
Thank you!

github-actions · 2023-05-05T01:50:59Z

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions · 2023-05-12T01:52:38Z

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler · 2023-05-12T01:52:43Z

Are you satisfied with the resolution of your issue?
Yes
No

pindinagesh self-assigned this Nov 2, 2021

pindinagesh added the type:bug label Nov 2, 2021

pindinagesh added the stat:awaiting response label Nov 2, 2021

pindinagesh assigned guanxinq and unassigned pindinagesh Nov 16, 2021

pindinagesh added stat:awaiting tensorflower and removed stat:awaiting response labels Nov 16, 2021

guanxinq removed their assignment Nov 18, 2021

singhniraj08 assigned gharibian Aug 17, 2022

singhniraj08 assigned nniuzft and unassigned gharibian Feb 17, 2023

singhniraj08 self-assigned this Apr 27, 2023

singhniraj08 added stat:awaiting response and removed stat:awaiting tensorflower labels Apr 27, 2023

github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label May 5, 2023

github-actions bot closed this as completed May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why TF Serving GPU using GPU Memory very much? #1929

Why TF Serving GPU using GPU Memory very much? #1929

duonghb53 commented Oct 31, 2021

pindinagesh commented Nov 2, 2021

shv07 commented Nov 8, 2021

shv07 commented Nov 16, 2021

guanxinq commented Nov 19, 2021

duonghb53 commented Nov 20, 2021

shv07 commented Nov 22, 2021 •

edited

Loading

singhniraj08 commented Apr 27, 2023

github-actions bot commented May 5, 2023

github-actions bot commented May 12, 2023

google-ml-butler bot commented May 12, 2023

Why TF Serving GPU using GPU Memory very much? #1929

Why TF Serving GPU using GPU Memory very much? #1929

Comments

duonghb53 commented Oct 31, 2021

Bug Report

System information

Describe the problem

Source code / logs

pindinagesh commented Nov 2, 2021

shv07 commented Nov 8, 2021

shv07 commented Nov 16, 2021

guanxinq commented Nov 19, 2021

duonghb53 commented Nov 20, 2021

shv07 commented Nov 22, 2021 • edited Loading

singhniraj08 commented Apr 27, 2023

github-actions bot commented May 5, 2023

github-actions bot commented May 12, 2023

google-ml-butler bot commented May 12, 2023

shv07 commented Nov 22, 2021 •

edited

Loading