Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring actual GPU memory usage #1407

Open
ctuluhu opened this issue Aug 2, 2019 · 16 comments
Open

Monitoring actual GPU memory usage #1407

ctuluhu opened this issue Aug 2, 2019 · 16 comments

Comments

@ctuluhu
Copy link

ctuluhu commented Aug 2, 2019

Describe the problem the feature is intended to solve

I have several models loaded and not sure how can I know if Tensorflow still has some memory left. I can check using nvidia-smi how much memory is allocated by Tensorflow but I couldn't find a way to check loaded models usage.

Describe the solution

Tensorflow could provide some metrics for Prometheus about actual GPU memory usage by each loaded model.

Describe alternatives you've considered

None.

Additional context

I am not sure if this is actually a feature request or it can be done somehow at the moment.

@rmothukuru rmothukuru self-assigned this Aug 5, 2019
@rmothukuru
Copy link

@ctuluhu ,
Can you please check this link and let us know if it helps.

@ctuluhu
Copy link
Author

ctuluhu commented Aug 5, 2019

@rmothukuru
Thank you for your response but I didn't find what I am looking for in provided link.
What I need is a way to get the amount of free memory allocated by Tensorflow.

Issue eg: Tensorflow allocated 6GB memory, later I have loaded two models into Tensorflow memory, how can I know how much of this 6GB is used by loaded models and how much of this memory is free?

@peddybeats
Copy link
Member

Hi there, we can easily export metrics that tell you host memory consumption on a per model basis but I think you're specifically looking for GPU's memory consumption/availability correct?
This is not straight forward but we will discuss it internally to understand exactly how difficult it would be an provide an update. Let us know if you have any additional info that you think might be useful for us to know!

@ctuluhu
Copy link
Author

ctuluhu commented Aug 12, 2019

Hi, yes I am looking for a way to check GPU's memory availability. Such a feature would be great.

@troycheng
Copy link

We also met this problem. TF-serving occupied all GPU memory when it started and there is no way to know how much memory really needed for a specific model. If we deploy too much models in a server instance, sometimes it will hang up and do not response , all connections to it will timeout. Thus, for multiple models, we need to do lot of load test to decide which can be deployed together in one instance and which need to be deployed in another.

@aaroey
Copy link
Member

aaroey commented Aug 26, 2019

@ctuluhu @troycheng @unclepeddy one way to mitigate the problem is to use environment flag TF_FORCE_GPU_ALLOW_GROWTH=true when you launch your model server. It'll grab minimum required GPU memory at startup and gradually increase the consumption as needed. Please let me know if it works.

@aaroey
Copy link
Member

aaroey commented Aug 26, 2019

Please also see this stackoverflow question about how to monitor memory usage using memory_stat ops and run_metadata.

@rmothukuru rmothukuru self-assigned this Sep 12, 2019
@rmothukuru
Copy link

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

@patrickvonplaten
Copy link

Is there any update on this issue? In TF 2.2 I still don't see an easy way to measure actually and peak memory usage

@junneyang
Copy link

why close

@rmothukuru rmothukuru assigned guanxinq and unassigned peddybeats and rmothukuru Dec 17, 2020
@rmothukuru rmothukuru reopened this Dec 17, 2020
@zw0610
Copy link

zw0610 commented Feb 26, 2021

@ctuluhu @troycheng @unclepeddy one way to mitigate the problem is to use environment flag TF_FORCE_GPU_ALLOW_GROWTH=true when you launch your model server. It'll grab minimum required GPU memory at startup and gradually increase the consumption as needed. Please let me know if it works.

I believe this can be considered as a basic solution to the problem. But the GPU memory usage cannot be fully separated according to the model loaded as part of the GPU memory usage are cost by stuff like CUDA context, which is shared among loaded models.

Meanwhile, it seems there should be a limit for each model's GPU memory growth, which should be related to the parameters of the model and max batch size set to TFServing.

Also, TF_FORCE_GPU_ALLOW_GROWTH=true should not affect the latency of TFServing for handling request after the first request (if the memory is allocated for the entire batch size). The GPU memory allocated before seems not deallocated if no further request received.

@gaocegege

@Akhp888
Copy link

Akhp888 commented Mar 9, 2021

Any workaround for this ?
atleast using the metrics in prometheus ?

@gaocegege
Copy link

We still don't see an easy way to monitor GPU memory usage. Is there any progress?

@spate141
Copy link

spate141 commented Aug 6, 2021

It's been 2 years, 4 days and we still don't have any update on one of the most vital part.

Great!

@guanxinq
Copy link
Contributor

Sorry for the late reply.

Could you try the memory profiling tool to see if it helps https://www.tensorflow.org/guide/profiler#memory_profile_tool?
You could also take a look at https://www.tensorflow.org/api_docs/python/tf/config/experimental/get_memory_info, which provides the current and peak memory that TensorFlow is actually using.
 However, this might not work since the models are running online on c++ model servers.

@spate141
Copy link

@guanxinq I think people are more interested in something from the tensorflow/serving side!
Example use case could be monitoring the GPU usage while serving model in production. If it's available through some API endpoint, that information could be useful to scale the cluster or increase the backend workers models. nvidia-smi or the related implementation are there but something directly from the serving would definitely be more useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests