Add GRPC Interceptors to TF Serving to capture transport-level overheads #1955

salliewalecka · 2021-12-22T00:42:39Z

Feature Request

If this is a feature request, please fill out the following form in full:

Describe the problem the feature is intended to solve

I want to close the the gap between latency as seen in :tensorflow:core:graph_run_time_usecs_histogram_bucket and latency as seen by the client by adding transport-level tracing. Then I will have additional metrics for network delay + request queuing and request serialization and deserialization on the server side. I've experienced high latencies that cannot be explained by tensorboard or the graph latency, which has turned out to be a blocker to launch some models.

Describe the solution

I want to get metrics similar to DoorDash's implementation of this tracing using GRPC Interceptors. However, using client interceptors is not enough, as we need to have server side interceptors to be able to track the whole request lifecycle. Thus, we need the ability to add these interceptors that can report the request event lengths. I'm not sure what the exact mechanism should be to gather these metrics after they are created by the interceptors, but somehow getting these into the metrics endpoint prometheus scrapes.

Describe alternatives you've considered

We can't get the information we need with client-only metrics, and have looked through all other metrics offered by TF Serving and none of them help us explain extra non-graph latency. We've done latency tests from different points in our infrastructure, but having these metrics would be really valuable to pinpoint the source of latency.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Container-Optimized OS
TensorFlow Serving installed from (source or binary): docker image
TensorFlow Serving version: 2.5.2 and 2.6.1

The text was updated successfully, but these errors were encountered:

salliewalecka · 2021-12-22T00:50:39Z

DoorDash provides some good psuedo code for their interceptor. If you could point me to the analogous spot to add in the tracer for your server, that would also be appreciated.

ndeepesh · 2022-03-14T19:23:02Z

Hello, any progress on this request? Or if you can point to some code where we can add the custom client interceptors. That will be great

pindinagesh self-assigned this Dec 22, 2021

pindinagesh added the type:feature label Dec 22, 2021

pindinagesh assigned guanxinq and unassigned pindinagesh Dec 22, 2021

pindinagesh added the stat:awaiting tensorflower label Dec 22, 2021

guanxinq assigned guanxinq and christisg and unassigned guanxinq Dec 23, 2021

salliewalecka mentioned this issue Apr 18, 2022

Expose traces via Open Telemetry to allow for distributed tracing for production use cases #1994

Open

singhniraj08 assigned nniuzft and unassigned christisg and guanxinq Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GRPC Interceptors to TF Serving to capture transport-level overheads #1955

Add GRPC Interceptors to TF Serving to capture transport-level overheads #1955

salliewalecka commented Dec 22, 2021

salliewalecka commented Dec 22, 2021

ndeepesh commented Mar 14, 2022

Add GRPC Interceptors to TF Serving to capture transport-level overheads #1955

Add GRPC Interceptors to TF Serving to capture transport-level overheads #1955

Comments

salliewalecka commented Dec 22, 2021

Feature Request

Describe the problem the feature is intended to solve

Describe the solution

Describe alternatives you've considered

System information

salliewalecka commented Dec 22, 2021

ndeepesh commented Mar 14, 2022