Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GRPC Interceptors to TF Serving to capture transport-level overheads #1955

Open
salliewalecka opened this issue Dec 22, 2021 · 2 comments

Comments

@salliewalecka
Copy link

Feature Request

If this is a feature request, please fill out the following form in full:

Describe the problem the feature is intended to solve

I want to close the the gap between latency as seen in :tensorflow:core:graph_run_time_usecs_histogram_bucket and latency as seen by the client by adding transport-level tracing. Then I will have additional metrics for network delay + request queuing and request serialization and deserialization on the server side. I've experienced high latencies that cannot be explained by tensorboard or the graph latency, which has turned out to be a blocker to launch some models.

Describe the solution

I want to get metrics similar to DoorDash's implementation of this tracing using GRPC Interceptors. However, using client interceptors is not enough, as we need to have server side interceptors to be able to track the whole request lifecycle. Thus, we need the ability to add these interceptors that can report the request event lengths. I'm not sure what the exact mechanism should be to gather these metrics after they are created by the interceptors, but somehow getting these into the metrics endpoint prometheus scrapes.

Describe alternatives you've considered

We can't get the information we need with client-only metrics, and have looked through all other metrics offered by TF Serving and none of them help us explain extra non-graph latency. We've done latency tests from different points in our infrastructure, but having these metrics would be really valuable to pinpoint the source of latency.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Container-Optimized OS
  • TensorFlow Serving installed from (source or binary): docker image
  • TensorFlow Serving version: 2.5.2 and 2.6.1
@salliewalecka
Copy link
Author

DoorDash provides some good psuedo code for their interceptor. If you could point me to the analogous spot to add in the tracer for your server, that would also be appreciated.

@ndeepesh
Copy link

Hello, any progress on this request? Or if you can point to some code where we can add the custom client interceptors. That will be great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants