Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FeatureRequest] Monitoring metrics for supervisors status #17526

Open
layoaster opened this issue Nov 29, 2024 · 2 comments
Open

[FeatureRequest] Monitoring metrics for supervisors status #17526

layoaster opened this issue Nov 29, 2024 · 2 comments

Comments

@layoaster
Copy link

Description

Currently, there is no way of monitoring the state of supervisors' health via monitoring metrics.

I would like to request a metric (or set of metrics) to report the following:

  • Supervisor health status: whether a supervisor is healthy or not. (Druid API).
  • Supervisor current state/status: e.g.: RUNNING, SUSPENDED, etc ... (List of possible states)

Motivation

  • To be able to monitor the status of Druid supervisors via monitoring metrics.
  • Supervisor is a critical piece of Druid when performing streaming ingestion.
@ashwintumma23
Copy link
Contributor

A caveat we should definitely consider in this request is how we can represent string labels as metrics.

@layoaster
Copy link
Author

layoaster commented Dec 3, 2024

@ashwintumma23 AFAIK this is not an issue for Prometheus.

A sample implementation could rely on a Prometheus gauge metric that have two possible values: 0 or 1 and leverage "dimensions" (Prometheus labels) to store the different states RUNNING, SUSPENDED, etc as strings and the also a separate label for the supervisor id/name. (https://prometheus.io/docs/concepts/data_model/)

The value 1 correspond to the current/active state of any given supervisor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants