-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable AMP (Automatic Mixed Precision ) in Tensorflow Serving. #1583
Comments
This I think should be very high priority ( at the least FP16) , otherwise the case for TFS becomes weak. |
The AMP is mainly target on training instead of serving.(https://www.tensorflow.org/guide/keras/mixed_precision) Have you observed the significant performance difference for serving as well? If so, could you share the benchmark and related numbers? |
How do I turn on AMP in serving ? I have observed 50% improvement in processiong time with fp16 over fp32 without any noticeable change in accuracy. Reduced precision is one of the corner stones of Nvidia TensorRT, etc. See this one also - https://medium.com/@whatdhack/neural-network-inference-optimization-8651b95e44ee . |
Is there a way to do the following in TFS ?
|
I just ran some tests on a MaskRCNN Saved Model in nvcr.io/nvidia/tensorflow:20.03-tf1-py3. TF_ENABLE_AUTO_MIXED_PRECISION seems to work very well for inference - requires less memory and speeds up significantly. The following are the numbers , if you need more convincing. TF_ENABLE_AUTO_MIXED_PRECISION =1, memory = 4.2GB , inference speed 0.25 sec . vs memory = 7.1 GB , inference speed 0.53 sec . |
Thanks for the experiments and numbers! Based on the number, we could add the option. I will also follow up with our GPU team. |
Any update here? Also, is it possible to enable JIT/XLA as well like #1515 ? |
Any update here? |
I'd really appreciate this feature being added too |
Hi, Any updates here? |
1 similar comment
Hi, Any updates here? |
Describe the problem the feature is intended to solve
AMP accelerates inference significantly.
Describe the solution
A flag for enabling AMP
Describe alternatives you've considered
There is no alternative with Tensorflow Serving
Additional context
N/A
The text was updated successfully, but these errors were encountered: