-
-
Notifications
You must be signed in to change notification settings - Fork 69
Ruler Alerts
qryn v1.3.1+ implements an Alertmanager compatible API to support Grafana Advanced Alerting
The ruler API uses the concept of a "namespace" when creating rule groups. This is a stand-in for the name of the rule file in Prometheus. Rule groups must be named uniquely within a namespace.
A master Alertmanager instance is required to operate.
In qryn set the ALERTMAN_URL
to point at your master alertmanager
instance, ie: http://my.alert.manager.instance:port
The following endpoints are exposed by the qryn ruler:
GET /api/prom/rules
GET /api/prom/rules/{namespace}
GET /api/prom/rules/{namespace}/{groupName}
POST /api/prom/rules/{namespace}
DELETE /api/prom/rules/{namespace}/{groupName}
DELETE /api/prom/rules/{namespace}
Starting from v1.3.1 qryn support is set for Alerts via Grafana.
1. In the Grafana Menu, click the Bell icon to open the Alerting page listing existing alerts. 2. Click New alert rule. 3. In Step 1 of the creation dialog, add the rule name, rule type and data source (qryn).
- In Rule name, add a descriptive name. This name is displayed in the alert rule list. It is also the alertname label for every alert instance that is created from this rule.
- From the Rule type drop-down, select Cortex / Loki managed alert.
- From the Select data source drop-down, select an external qryn data source.
- From the Namespace drop-down, select an existing rule namespace. Otherwise, click Add new and enter a name to create a new one. Namespaces can contain one or more rule groups and only have an organizational purpose.
- From the Group drop-down, select an existing group within the selected namespace. Otherwise, click Add new and enter a name to create a new one. Newly created rules are appended to the end of the group. Rules within a group are run sequentially at a regular interval, with the same evaluation time.
4. In Step 2 of the creation dialog, add the query to evaluate.
- Enter a Metrics or LogQL expression. The rule fires if the evaluation result has at least one series with a value that is greater than 0.
5. In Step 3 of the creation dialog, add conditions.
- In the For text box, specify the duration for which the condition must be true before an alert fires. If you specify 5 minutes, the condition must be true for 5 minutes before the alert fires.
Note: Once a condition is met, the alert goes into the
Pending
state. If the condition remains active for the duration specified, the alert transitions to theFiring
state, else it reverts to theNormal
state.
6. In Step 4 of the creation dialog, add additional metadata associated with the rule.
- Add a description and summary to customize alert messages.
- Add Runbook URL, panel, dashboard, and alert IDs.
- Add custom labels.
7. Click Save to save the rule or Save and exit to save the rule and go back to the Alerting page.
The following are a few working alert rule examples using both logQL and metrics queries:
-
avg_over_time({system="server.e2.central"} | json | unwrap cpu_percent [5s]) > 90
-- CPU over 90% over 5s average -
rate({system="server.e1.central"} |~ "error" [5s]) > 1
-- Error rate is more than 1 over 5s bucket
-
avg_over_time({system="server.e3.west"} | unwrap_value [5s]) > 70
-- The metric measured over 70 in a 5s average
A rule that is firing will appear as Firing
state in the UI. Below is an example of what a Firing
rule looks like:
Note that the rule has been firing for 3s and the type of the data (if configured correctly for the metric see Metrics HTTP API on how to set correct labels ) is indicated as cpu_percent metric. The condition was set deliberately low to 20% to show a Firing
rule.