Socket Closed when running on K8S #5023

avickars · 2022-11-01T00:31:46Z

Description
A clear and concise description of what the bug is.

When running Triton on K8S using this configuration exactly (https://github.com/triton-inference-server/server/tree/main/deploy/k8s-onprem) on AWS's EKS with vertical auto-scaling enabled (https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html), in combination with the C++ Async GRPC client, the socket is closing intermittently. I.e. we are getting are a "socket closed" error.

Triton Information
What version of Triton are you using?

22.09

Are you using the Triton container or did you build it yourself?

We are using the triton container,

To Reproduce
Steps to reproduce the behavior.

Implement the above implementation (https://github.com/triton-inference-server/server/tree/main/deploy/k8s-onprem) with vertical scaling as well on EKS.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

There are a number of models, unfortunately they are all proprietary.

Expected behavior
A clear and concise description of what you expected to happen.

The socket shouldn't close

Unfortunately, I have seen a few issues on here that appear to be fairly similar, but it doesn't look like any of them have been sufficiently solved.

My suspicion is that the issue is coming from the vertical auto-scaling in some capacity as it appears the issue stops when the vertical scaling is disabled. So I am wondering if that could cause issues in conjunction with the GRPC requests. I should not that I am very unfamiliar with GRPC type requests so I'm just speculating. I wanted to post here to see if anyone has any ideas, or if this potentially a bug.

I am experimenting with the "grpc-keepalive" etc settings in triton right now, and will post the results of that shortly as well.

I have also set the triton server logging verbosity to 1 and enabled --log-info, --log-warning and --log-error, however nothing pertaining to the socket closing is showing up in the Triton logs. Is there another way to see the grpc logging and to see why the socket is closing?

tanmayv25 · 2022-11-03T21:33:29Z

Looks like an interesting observation. I am not very familiar with how vertical auto-scaling works on AWS's EKS. Does it affect the gRPC connections? Do you have any other gRPC service besides triton that works without this issue while auto-scaling?

tanmayv25 added the bug Something isn't working label Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Socket Closed when running on K8S #5023

Socket Closed when running on K8S #5023

avickars commented Nov 1, 2022 •

edited

Loading

tanmayv25 commented Nov 3, 2022

Socket Closed when running on K8S #5023

Socket Closed when running on K8S #5023

Comments

avickars commented Nov 1, 2022 • edited Loading

tanmayv25 commented Nov 3, 2022

avickars commented Nov 1, 2022 •

edited

Loading