Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket Closed when running on K8S #5023

Open
avickars opened this issue Nov 1, 2022 · 1 comment
Open

Socket Closed when running on K8S #5023

avickars opened this issue Nov 1, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@avickars
Copy link

avickars commented Nov 1, 2022

Description
A clear and concise description of what the bug is.

When running Triton on K8S using this configuration exactly (https://github.com/triton-inference-server/server/tree/main/deploy/k8s-onprem) on AWS's EKS with vertical auto-scaling enabled (https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html), in combination with the C++ Async GRPC client, the socket is closing intermittently. I.e. we are getting are a "socket closed" error.

Triton Information
What version of Triton are you using?

22.09

Are you using the Triton container or did you build it yourself?

We are using the triton container,

To Reproduce
Steps to reproduce the behavior.

Implement the above implementation (https://github.com/triton-inference-server/server/tree/main/deploy/k8s-onprem) with vertical scaling as well on EKS.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

There are a number of models, unfortunately they are all proprietary.

Expected behavior
A clear and concise description of what you expected to happen.

The socket shouldn't close

Unfortunately, I have seen a few issues on here that appear to be fairly similar, but it doesn't look like any of them have been sufficiently solved.

My suspicion is that the issue is coming from the vertical auto-scaling in some capacity as it appears the issue stops when the vertical scaling is disabled. So I am wondering if that could cause issues in conjunction with the GRPC requests. I should not that I am very unfamiliar with GRPC type requests so I'm just speculating. I wanted to post here to see if anyone has any ideas, or if this potentially a bug.

I am experimenting with the "grpc-keepalive" etc settings in triton right now, and will post the results of that shortly as well.

I have also set the triton server logging verbosity to 1 and enabled --log-info, --log-warning and --log-error, however nothing pertaining to the socket closing is showing up in the Triton logs. Is there another way to see the grpc logging and to see why the socket is closing?

@tanmayv25
Copy link
Contributor

Looks like an interesting observation. I am not very familiar with how vertical auto-scaling works on AWS's EKS. Does it affect the gRPC connections? Do you have any other gRPC service besides triton that works without this issue while auto-scaling?

@tanmayv25 tanmayv25 added the bug Something isn't working label Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants