Skip to content

[SLI-Metrics] kuberay_service_ready #3577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

owenowenisme
Copy link
Contributor

@owenowenisme owenowenisme commented May 10, 2025

Why are these changes needed?

Added kuberay_service_ready as written in KubeRay v1.4.0 SLI proposal

❯ curl -s localhost:8080/metrics | grep  kuberay_
# HELP kuberay_service_ready RayServiceReady means users can send requests to the underlying cluster and the number of serve endpoints is greater than 0.
# TYPE kuberay_service_ready gauge
kuberay_service_ready{condition="false",name="rayservice-sample",namespace="default"} 1
❯ curl -s localhost:8080/metrics | grep  kuberay_
# HELP kuberay_service_ready RayServiceReady means users can send requests to the underlying cluster and the number of serve endpoints is greater than 0.
# TYPE kuberay_service_ready gauge
kuberay_service_ready{condition="true",name="rayservice-sample",namespace="default"} 1

Related issue number

Closes #3177

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@owenowenisme
Copy link
Contributor Author

@win5923 @troychiu

@owenowenisme owenowenisme force-pushed the SLI-Metrics/kuberay-service-ready branch from 7b1dcdb to a0ef666 Compare May 12, 2025 17:53
@owenowenisme owenowenisme requested a review from win5923 May 13, 2025 17:09
@owenowenisme owenowenisme marked this pull request as ready for review May 13, 2025 17:09
Copy link
Contributor

@win5923 win5923 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@owenowenisme owenowenisme marked this pull request as draft May 15, 2025 02:11
Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
@owenowenisme owenowenisme force-pushed the SLI-Metrics/kuberay-service-ready branch from 891aded to b633a37 Compare May 18, 2025 10:02
@owenowenisme owenowenisme marked this pull request as ready for review May 18, 2025 10:09
@owenowenisme
Copy link
Contributor Author

owenowenisme commented May 18, 2025

@troychiu @kevin85421 Would you mind taking a look?

@troychiu
Copy link
Contributor

troychiu commented May 19, 2025

Can't we simply reuse collectRayServiceInfo?

func (c *RayServiceMetricsManager) collectRayServiceInfo(service *rayv1.RayService, ch chan<- prometheus.Metric) {

@owenowenisme
Copy link
Contributor Author

owenowenisme commented May 19, 2025

RayServiceInfo is Desc while RayServiceReady is GaugeVec I don't think we should mix them together.

@troychiu
Copy link
Contributor

RayServiceInfo is a Gauge type of metric which is the same as RayServiceReady. Desc is not a type of metric but a prometheus descriptor. What I meant is we can use the same way to collect RayServiceReady as we did for RayServiceInfo

@owenowenisme
Copy link
Contributor Author

owenowenisme commented May 19, 2025

@troychiu I get what you meant now.

func (c *RayServiceMetricsManager) Collect(ch chan<- prometheus.Metric) {
	var rayServiceList rayv1.RayServiceList
	if err := c.client.List(context.Background(), &rayServiceList); err != nil {
		c.log.Error(err, "Failed to list RayServices")
		return
	}
	for _, rayService := range rayServiceList.Items {
		c.collectRayServiceInfo(&rayService, ch)
	}
	c.RayServiceReady.Collect(ch)
}

func (c *RayServiceMetricsManager) collectRayServiceInfo(service *rayv1.RayService, ch chan<- prometheus.Metric) {
	ch <- prometheus.MustNewConstMetric(
		c.rayServiceInfo,
		prometheus.GaugeValue,
		1,
		service.Name,
		service.Namespace,
	)
	ready := meta.IsStatusConditionTrue(service.Status.Conditions, string(rayv1.RayServiceReady))
	c.RayServiceReady.WithLabelValues(service.Name, service.Namespace, strconv.FormatBool(!ready)).Set(0)
	c.RayServiceReady.WithLabelValues(service.Name, service.Namespace, strconv.FormatBool(ready)).Set(1)
}

But just to make sure, is this what you're thinking in mind?
BTW, I think this is a nice approach! Because we don't have to modify controller with this method. And can apply to rayCluster and rayJob.

@troychiu
Copy link
Contributor

troychiu commented May 19, 2025

Yeah that's it. Actually we can make it simpler.
you can simply do

ch <- prometheus.MustNewConstMetric(
        c.rayServiceReady,
        prometheus.GaugeValue,
        1,
        service.Name,
        service.Namespace,
        condition, // strconv.FormatBool(...) in this case
)

and

for _, rayService := range rayServiceList.Items {
        c.collectRayServiceInfo(&rayService, ch)
        c.collectRayServiceReady(&rayService, ch)
}

Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
@owenowenisme
Copy link
Contributor Author

Cc @kevin85421 PTAL🙏🙏

Signed-off-by: You-Cheng Lin <[email protected]>
@troychiu
Copy link
Contributor

@owenowenisme can you fix the CI?

@kevin85421
Copy link
Member

I rerun the failed test. @owenowenisme please ping me when all CI tests pass.

@owenowenisme
Copy link
Contributor Author

@kevin85421 CI passed.

@kevin85421 kevin85421 merged commit ee0d6c1 into ray-project:master May 22, 2025
24 checks passed
@owenowenisme owenowenisme deleted the SLI-Metrics/kuberay-service-ready branch May 22, 2025 18:05
pawelpaszki pushed a commit to opendatahub-io/kuberay that referenced this pull request May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][metrics] kuberay_service_ready
4 participants