Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel updates of VPAs and checkpoints. #7951

Open
tkukushkin opened this issue Mar 19, 2025 · 22 comments
Open

Parallel updates of VPAs and checkpoints. #7951

tkukushkin opened this issue Mar 19, 2025 · 22 comments
Assignees
Labels
area/vertical-pod-autoscaler help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@tkukushkin
Copy link

Which component are you using?:

/area vertical-pod-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Hi! We have more than 2.7k VPAs in our cluster and we encountered a problem, that recommender starts working really slowly, one recommendations cycle takes more than 10 minutes.

We've tried to make updating all VPAs and all checkpoints in parallel with simple wait groups, relying on rate limit of --kube-api-qps and it works pretty nice, UpdateVPAs step takes around 9 seconds and MaintainCheckpoints takes 13 seconds in our cluster.

This also allowed to remove --min-checkpoints and --checkpoints-timeout options because imho they don't make sense anymore.

I pushed modified version for reference: master...tkukushkin:autoscaler:parallel-updates

Describe the solution you'd like.:

I'm not good at Go, otherwise I would make a Pull Request by myself, but I think such approach might be implemented.

I'm not completely sure --min-checkpoints and --checkpoints-timeout options should be removed.

Also may be it's better to add specific rate limit for update operations.

@tkukushkin tkukushkin added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 19, 2025
@adrianmoisey
Copy link
Member

Hi! We have more than 2.7k VPAs in our cluster and we encountered a problem, that recommender starts working really slowly, one recommendations cycle takes more than 10 minutes.

Wow! That's pretty darn big!

Thanks for the code change and description, I think we could improve something here. Will see if someone wants to take the issue and see how they solve it.
/help-wanted

@adrianmoisey
Copy link
Member

/help
/triage accepted

@k8s-ci-robot
Copy link
Contributor

@adrianmoisey:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/triage accepted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Mar 19, 2025
@omerap12
Copy link
Member

Thanks for this! I really like that approach :) Updating the VPAs concurrently is definitely a great idea.
I do think there's something off with min-checkpoints, but I disagree about checkpoints-timeout.

/assign

@adrianmoisey
Copy link
Member

I'm wondering if this will require a setting to control concurrency, but I'll wait for a PR before I make more comments :P

@omerap12
Copy link
Member

Yeah, I see your point. I'll dig into this and see if we can come up with a smarter approach. In any case, I plan to put "concurrent mode" behind a flag (defaulting to false for now) so users can choose between both modes. WDYT?

I'm wondering if this will require a setting to control concurrency, but I'll wait for a PR before I make more comments :P

@adrianmoisey
Copy link
Member

Yeah, I see your point. I'll dig into this and see if we can come up with a smarter approach. In any case, I plan to put "concurrent mode" behind a flag (defaulting to false for now) so users can choose between both modes. WDYT?

That makes perfect sense to me

@voelzmo
Copy link
Contributor

voelzmo commented Mar 20, 2025

Hey @tkukushkin,

looking at the numbers you provide, it seems that you left the default values for the recommender client-side qps settings:

flag.Float64Var(&cf.KubeApiQps, "kube-api-qps", 5.0, "QPS limit when making requests to Kubernetes apiserver")
flag.Float64Var(&cf.KubeApiBurst, "kube-api-burst", 10.0, "QPS burst limit when making requests to Kubernetes apiserver")

2700/5/60 = 9, so my guess is it should take 9 minutes to run UpdateVPAs and then some additional time until the minimum amount of Checkpoints has been updated.

As you can see, those values (5 requests/s to the KAPI) are not fit for large-scale usage. There was an upstream discussion around removing client-side rate limiting at all, now that API priority and fairness is available for the KAPIs, maybe we could pick this up for VPA as well and remove the client-side rate limiting entirely?

@voelzmo
Copy link
Contributor

voelzmo commented Mar 20, 2025

As additional context: vpa-recommender has some pretty good instrumentation, allowing you to keep an eye on how long the individual steps in its loop take. A few years back, I had an issue showing the histograms for those steps in action – we had similar problems in large-scale environments like you do now: #4498

@tkukushkin
Copy link
Author

Hey @voelzmo,

looking at the numbers you provide, it seems that you left the default values for the recommender client-side qps settings:

I didn't change defaults, we override them through command line options of our recommender's deployment, our current values are --kube-api-qps=200 --kube-api-burst=200.

There was kubernetes/kubernetes#111880 around removing client-side rate limiting at all, now that API priority and fairness is available for the KAPIs, maybe we could pick this up for VPA as well and remove the client-side rate limiting entirely?

My knowledge about it is really poor, so I just trust your opinion.

As additional context: vpa-recommender has some pretty good instrumentation, allowing you to keep an eye on how long the individual steps in its loop take.

Yeah, we know about these metrics and monitor them.

@voelzmo
Copy link
Contributor

voelzmo commented Mar 20, 2025

Oh that's so confusing, because the numbers seemed to line up so nicely! Which version of VPA are you running?

Maybe the recommender needs more than 1 call to update a VPA status then?
2700/200=13.5, this whole thing should finish in <14 seconds. Could you try to increase the the qps limits to e.g. 1000?

@tkukushkin
Copy link
Author

Which version of VPA are you running?

We've running 1.2.1.

Maybe the recommender needs more than 1 call to update a VPA status then?

As far as I understand, 1 call to update VPA, 1 call to update checkpoint. Updating of VPAs takes 9 seconds, updating of checkpoints takes 13 seconds, in total one iteration takes less than 25 seconds and it's totally fine for us.

Could you try to increase the the qps limits to e.g. 1000?

I wouldn't like to increase load on our Kubernetes masters so much.

@voelzmo
Copy link
Contributor

voelzmo commented Mar 20, 2025

Oh, lol, sorry – I misread seconds for minutes. Not enough coffee this morning ☕

@voelzmo
Copy link
Contributor

voelzmo commented Mar 20, 2025

No, wait, I did read "minutes" in your original post:

Hi! We have more than 2.7k VPAs in our cluster and we encountered a problem, that recommender starts working really slowly, one recommendations cycle takes more than 10 minutes.

In the scenario, where one loop took 10 minutes to execute:

  • what were your qps settings?
  • how long did the steps take?

I understand that using your modified code, the UpdateVPA step takes 9 seconds and MaintainCheckpoints takes 13 seconds. It makes sense that this is fast enough for you ;)

I wouldn't like to increase load on our Kubernetes masters so much.

That's what the goroutine-based solution does as well, doesn't it?
The reason why using goroutines is so much faster is that it effectively skips the client-side rate limiting and doing more queries in less time. You should see similar execution times with high enough QPS settings.

@omerap12
Copy link
Member

That's what the goroutine-based solution does as well, doesn't it? The reason why using goroutines is so much faster is that it effectively skips the client-side rate limiting and doing more queries in less time. You should see similar execution times with high enough QPS settings.

So if I got it right, we should completely remove client-side rate limiting (regardless of this issue) since Kubernetes has built-in flow control (https://kubernetes.io/docs/concepts/cluster-administration/flow-control/) from version 1.20, right?

@tkukushkin
Copy link
Author

what were your qps settings?

If memory serves me well we had default settings.

how long did the steps take?

I don't have this information anymore, it was long ago 😞

But I remember that even after this change default rate limit was not enough to fit in 1 minute, so we increased it through CLI options.

That's what the goroutine-based solution does as well, doesn't it?
The reason why using goroutines is so much faster is that it effectively skips the client-side rate limiting and doing more queries in less time.

I don't get this point, sorry, could you please explain why it skips client-side rate limiting?

I believe goroutine-based solution makes as many requests concurrently as client side rate limit allows.

And I see a lot of logs like:

Waited for 9.935918784s due to client-side throttling, not priority and fairness, ...

You should see similar execution times with high enough QPS settings.

Current approach makes only 1 request at a time. Even if rate limit is high, like 1000 qps, API server has to reply in 1ms to make 1000 requests in 1 second?

@tkukushkin
Copy link
Author

I apologize that I misled you with numbers by not providing the settings under which we received them.

@voelzmo
Copy link
Contributor

voelzmo commented Mar 20, 2025

Yeah, sorry, I'm easily confused today. Great point about the query latency, which I absolutely wasn't accounting for!

So to summarize

  • we understand why with an unmodified vpa-recommender with default QPS settings it takes ~10 minutes to process a single loop at the scale you're using it in:
    • client-side rate limiting leads to at least 9 minutes (even with query latency of 0)
    • query latency can make this even worse
  • we understand why a concurrent approach can be helpful
    • client-side rate limit increases are only helpful up to a certain point. Once you reach a certain scale, it is the query latency to the KAPI limiting your VPA update queries per second
  • we understand that we need/want an option to control the amount of concurrency
    • you showed that kube-api-qps and kube-api-burst can achieve this implicitly, because client-go seems to be able to track this in a goroutine-safe manner (TIL, I wasn't aware that this works). We still end up creating a goroutine for each VPA, though, even if they cannot do their work concurrently due to client-side rate limiting
    • we can also think about doing it explicitly by limiting the amount of goroutines and distributing the VPAs across them

Does that make sense?

@tkukushkin
Copy link
Author

tkukushkin commented Mar 20, 2025

Yes, it totally makes sense.

Btw, I've just tested not modified version of recommender with our current configuration of rate limit, and one iteration takes more than 6 minutes even with timeout error from MaintainCheckpoints. So it confirms, concurrency really helps there.

@adrianmoisey
Copy link
Member

So to summarize

  • we understand why with an unmodified vpa-recommender with default QPS settings it takes ~10 minutes to process a single loop at the scale you're using it in:

    • client-side rate limiting leads to at least 9 minutes (even with query latency of 0)
    • query latency can make this even worse
  • we understand why a concurrent approach can be helpful

    • client-side rate limit increases are only helpful up to a certain point. Once you reach a certain scale, it is the query latency to the KAPI limiting your VPA update queries per second
  • we understand that we need/want an option to control the amount of concurrency

    • you showed that kube-api-qps and kube-api-burst can achieve this implicitly, because client-go seems to be able to track this in a goroutine-safe manner (TIL, I wasn't aware that this works). We still end up creating a goroutine for each VPA, though, even if they cannot do their work concurrently due to client-side rate limiting
    • we can also think about doing it explicitly by limiting the amount of goroutines and distributing the VPAs across them

This matches my assumption of the system.

If concurrency is increased too high, the QPS and burst in client-go gets hit.

I assume a go-routine-per-vpa (as per the current solution) is quite overkill, and I think we need to set a sane (safe?) default, but also make it configurable, should users need to increase the throughput.

@omerap12
Copy link
Member

Yeah, sorry, I'm easily confused today. Great point about the query latency, which I absolutely wasn't accounting for!

So to summarize

  • we understand why with an unmodified vpa-recommender with default QPS settings it takes ~10 minutes to process a single loop at the scale you're using it in:

    • client-side rate limiting leads to at least 9 minutes (even with query latency of 0)
    • query latency can make this even worse
  • we understand why a concurrent approach can be helpful

    • client-side rate limit increases are only helpful up to a certain point. Once you reach a certain scale, it is the query latency to the KAPI limiting your VPA update queries per second
  • we understand that we need/want an option to control the amount of concurrency

    • you showed that kube-api-qps and kube-api-burst can achieve this implicitly, because client-go seems to be able to track this in a goroutine-safe manner (TIL, I wasn't aware that this works). We still end up creating a goroutine for each VPA, though, even if they cannot do their work concurrently due to client-side rate limiting
    • we can also think about doing it explicitly by limiting the amount of goroutines and distributing the VPAs across them

Does that make sense?

Thanks for a great summarisation!

@omerap12
Copy link
Member

/unassign
/assign @voelzmo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants