Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Enable prometheus metrics for local queues #2516

Merged

Conversation

varshaprasad96
Copy link
Member

What type of PR is this?

/kind feature
/kind documentation

What this PR does / why we need it:

This PR introduces an enhancement to enable collection of prometheus metrics for local queues.

Addresses issue: #1833

Which issue(s) this PR fixes:

Fixes # Partially #1833

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NA

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. kind/documentation Categorizes issue or PR as related to documentation. labels Jul 2, 2024
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 2, 2024
This PR introduces an enhancement to enable collection of
prometheus metrics for local queues.

Addresses issue: kubernetes-sigs#1833

Signed-off-by: Varsha Prasad Narsing <[email protected]>
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 2, 2024
Copy link

netlify bot commented Jul 2, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit be1601a
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/669aad3313e139000828f959

@varshaprasad96
Copy link
Member Author

@astefanutti @alculquicondor @tenzen-y Could you please take a look at the proposal and provide your inputs. Thank you!

@alculquicondor
Copy link
Contributor

/assign @PBundyra

@PBundyra
Copy link
Contributor

PBundyra commented Jul 8, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 8, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b3460149244f8070ae015034e8bba83771759ecc

@PBundyra
Copy link
Contributor

PBundyra commented Jul 8, 2024

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 8, 2024
@alculquicondor
Copy link
Contributor

is this ready for a pass from approvers?

@PBundyra
Copy link
Contributor

PBundyra commented Jul 9, 2024

is this ready for a pass from approvers?

Yes

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 10, 2024
Comment on lines 34 to 37
because they are global and cannot be filtered by namespaces. Furthermore, accessing cluster-scoped metrics
in secured Prometheus instances is generally restricted to cluster admin users with cluster level permissions across all namespaces and
tenants. This restriction makes it challenging for batch users to obtain the specific metrics they need for effective workload
management and to gain insights into their workloads within their limited scope of access.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this KEP also specify how we will prevent users with insufficient permission accessing metrics they shouldn't be able to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this proposal is not suggesting how to prevent access from namespaces that shouldn't have the permission, then I would remove this phrase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, maybe we could narrow the scope of this KEP @varshaprasad96 ? At first glance managing permissions seems to be challenging, or it would require external mechanism

Copy link
Member Author

@varshaprasad96 varshaprasad96 Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply! I've been considering various options, and it's clear that publishing local queue metrics in each namespace could lead to complications. This approach would require multiple endpoints with respective Service Monitors or, if using a single central Service Monitor, we would need the correct RBAC setup to allow client access. The centralised approach could still poses some issues, such as namespace admins potentially being able to view metrics from other namespaces if not provided with right SA.

One potential solution for cluster admins is to scaffold out a Service Monitor (SM) with metrics labeling for specific namespaces from the same service endpoint. This would enable a common service endpoint for all local queues. Admins could then provide a service account with right RBAC to specific batch user, restricting their access to that particular service monitor.

However, this solution is difficult to implement in Kueue right away by figuring out the right set of scaffolds, and seems to be the responsibility of the cluster admin. This could probably be a customisation which can be documented for now.

That being said, given the complexity, this topic may probably warrant a further brainstorming and deserves a separate KEP. For now, I'll remove this reference and update it to indicate that for this iteration metrics will be exported in the controller namespace, alongside cluster queue metrics, at the same endpoint. Does that sound reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does.

You can remove the per-namespace topic from the motivation and add a note in non-goals.

keps/1833-metrics-for-local-queue/kep.yaml Show resolved Hide resolved
keps/1833-metrics-for-local-queue/kep.yaml Outdated Show resolved Hide resolved
keps/1833-metrics-for-local-queue/README.md Outdated Show resolved Hide resolved
keps/1833-metrics-for-local-queue/README.md Outdated Show resolved Hide resolved
keps/1833-metrics-for-local-queue/README.md Outdated Show resolved Hide resolved
keps/1833-metrics-for-local-queue/README.md Outdated Show resolved Hide resolved
keps/1833-metrics-for-local-queue/README.md Outdated Show resolved Hide resolved
keps/1833-metrics-for-local-queue/README.md Outdated Show resolved Hide resolved
keps/1833-metrics-for-local-queue/README.md Outdated Show resolved Hide resolved
@varshaprasad96 varshaprasad96 force-pushed the kep-local-queue-metrics branch 2 times, most recently from bc697cc to 15ef983 Compare July 16, 2024 09:33
@varshaprasad96
Copy link
Member Author

@PBundyra @alculquicondor @tenzen-y I've updated the proposal based on the reviews. Please take a look when you get a chance. Thank you!

@PBundyra
Copy link
Contributor

PBundyra commented Jul 19, 2024

Thanks you @varshaprasad96!

Please address my latest suggestions, and besides that LGTM

@alculquicondor @tenzen-y Please take a look

@alculquicondor
Copy link
Contributor

/approve

I'll leave LGTM to @PBundyra

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 19, 2024
@varshaprasad96 varshaprasad96 force-pushed the kep-local-queue-metrics branch 2 times, most recently from 9d3d40d to a1320e1 Compare July 19, 2024 17:56
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Leave lgtm on @PBundyra

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, tenzen-y, varshaprasad96

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alculquicondor,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This commit addresses reviews by adding additional metrics
for local queue.

Signed-off-by: Varsha Prasad Narsing <[email protected]>
@varshaprasad96
Copy link
Member Author

@PBundyra I've addressed your open comments. If everything looks good, could you please approve the PR. Thanks!

@astefanutti
Copy link
Member

astefanutti commented Jul 23, 2024

Great work, thanks all!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 23, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 70b33a04bb64be859510af26def073f20d30ffdb

@astefanutti
Copy link
Member

@PBundyra apologies I missed the final comments and did not see you've been given the final LGTM.

@varshaprasad96
Copy link
Member Author

@PBundyra Just wanted to check if anything else is required to get the proposal merged? Thanks!

@PBundyra
Copy link
Contributor

Sorry for the late response, I was out of office for the last couple of days. Thanks for the KEP @varshaprasad96, great work!
/hold cancel
/lgtm

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 29, 2024
@k8s-ci-robot k8s-ci-robot merged commit fd67975 into kubernetes-sigs:main Jul 29, 2024
7 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.9 milestone Jul 29, 2024
kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Nov 19, 2024
…#2516)

* [Feature] Enable prometheus metrics for local queues

This PR introduces an enhancement to enable collection of
prometheus metrics for local queues.

Addresses issue: kubernetes-sigs#1833

Signed-off-by: Varsha Prasad Narsing <[email protected]>

* Address reviews

This commit addresses reviews by adding additional metrics
for local queue.

Signed-off-by: Varsha Prasad Narsing <[email protected]>

---------

Signed-off-by: Varsha Prasad Narsing <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants