Skip to content

Commit d3f03aa

Browse files
committed
Blog: Optimizing promql queries
1 parent 59e7d6c commit d3f03aa

File tree

3 files changed

+990
-364
lines changed

3 files changed

+990
-364
lines changed

.gitignore

+5
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,8 @@ compose-simple
3030

3131
/build-image-arm64.tar
3232
/build-image-amd64.tar
33+
34+
.DS_Store
35+
.hugo_build.lock
36+
public/
37+
.config
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
---
2+
date: 2025-04-29
3+
title: "Optimizing PromQL queries: A deep dive"
4+
linkTitle: Optimizing PromQL queries
5+
tags: [ "blog", "cortex", "query", "optimization" ]
6+
categories: [ "blog" ]
7+
projects: [ "cortex" ]
8+
description: >
9+
This guide explains how Cortex evaluates PromQL queries, details how time series data is stored and retrieved, and offers strategies to write performant queries—particularly in high-cardinality environments.
10+
author: Harry John ([@harry671003](https://github.com/harry671003))
11+
---
12+
13+
14+
## Introduction
15+
16+
This guide explains how Cortex evaluates PromQL queries, details how time series data is stored and retrieved, and offers strategies to write performant queries — particularly in high-cardinality environments.
17+
18+
Note: If you are new to PromQL, it is recommended to start with the [Querying basics documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/).
19+
20+
## Prometheus Concepts
21+
22+
### Data Model
23+
24+
Prometheus employs a straightforward data model:
25+
26+
* Each time series is uniquely identified by a metric name and a set of label-value pairs.
27+
* Each sample includes:
28+
* A millisecond precision timestamp
29+
* A 64 bit floating point value.
30+
31+
### Label Matchers
32+
33+
Label matchers define the selection criteria for time series within the TSDB. Consider the following PromQL expression:
34+
35+
```
36+
http_requests_total{cluster="prod", job="envoy"}
37+
```
38+
39+
the label matchers are:
40+
41+
* `__name__="http_requests_total"`
42+
* `cluster="prod"`
43+
* `job="envoy"`
44+
45+
46+
Prometheus supports four types of label matchers:
47+
48+
|Type |Syntax |Example |
49+
|--- |--- |--- |
50+
|Equal |label="value" |job="envoy" |
51+
|Not Equal |label!="value" |job!="prometheus" |
52+
|Regex Equal |label=~"regex" |job=~"env.*" |
53+
|Regex Not Equal |label!~"regex" |status!~"4.." |
54+
55+
## Time Series Storage in Cortex
56+
57+
Cortex uses Prometheus's Time Series Database (TSDB) for storing time series data. The Prometheus TSDB is time partitioned into blocks. Each TSDB block is made up of the following files:
58+
59+
* `ID` - ID of the block ([ULID](https://github.com/ulid/spec))
60+
* `meta.json` - Contains the metadata of the block
61+
* `index` - A binary file that contains the index
62+
* `chunks` - Directory containing the chunk segment files.
63+
64+
More details: [TSDB format docs](https://github.com/prometheus/prometheus/blob/main/tsdb/docs/format/README.md)
65+
66+
### Index File
67+
68+
The `index` file contains two key mappings for query processing:
69+
70+
* **Postings Offset Table and Postings**: Maps label-value pairs to Series IDs
71+
* **Series Section**: Maps series IDs to label sets and chunk references
72+
73+
#### Example
74+
75+
Given the following time series:
76+
77+
```
78+
http_requests_total{cluster="prod", job="envoy", status="200"} -> SeriesID(1)
79+
http_requests_total{cluster="prod", job="envoy", status="400"} -> SeriesID(2)
80+
http_requests_total{cluster="prod", job="envoy", status="500"} -> SeriesID(3)
81+
http_requests_total{cluster="prod", job="prometheus", status="200"} -> SeriesID(4)
82+
```
83+
84+
The index file would store mappings such as:
85+
86+
```
87+
__name__=http_requests_total → [1, 2, 3, 4]
88+
cluster=prod → [1, 2, 3, 4]
89+
job=envoy → [1, 2, 3]
90+
job=prometheus → [4]
91+
status=200 → [1, 4]
92+
status=400 → [2]
93+
status=500 → [3]
94+
```
95+
96+
### Chunks
97+
98+
Each chunk segment file can store up to **512MB** of data. Each chunk in the segment file typically holds up to **120 samples**.
99+
100+
## Query Execution in Cortex
101+
102+
To optimize PromQL queries effectively, it is essential to understand how queries are executed within Cortex. Consider the following example:
103+
104+
```
105+
sum(rate(http_requests_total{cluster="prod", job="envoy"}[5m]))
106+
```
107+
108+
### Block Selection
109+
110+
Cortex first identifies the TSDB blocks that fall within the query’s time range. This process is very fast in Cortex and will not add a huge overhead on query execution.
111+
112+
### Series Selection
113+
114+
Next, Cortex uses the inverted index to retrieve the set of matching series IDs for each label matcher. For example:
115+
116+
```
117+
__name__="http_requests_total" → [1, 2, 3, 4]
118+
cluster="prod" → [1, 2, 3, 4]
119+
job="envoy" → [1, 2, 3]
120+
```
121+
122+
The intersection of these sets yields:
123+
124+
```
125+
http_requests_total{cluster=“prod”, job=“envoy”, status=“200”}
126+
http_requests_total{cluster=“prod”, job=“envoy”, status=“400”}
127+
http_requests_total{cluster=“prod”, job=“envoy”, status=“500”}
128+
```
129+
130+
### Sample Selection
131+
132+
The mapping from series to chunks is used to identify the relevant chunks from the chunk segment files. These chunks are decoded to retrieve the underlying time series samples.
133+
134+
### PromQL evaluation
135+
136+
Using the retrieved series and samples, the PromQL engine evaluates the query. There are two modes of running queries:
137+
138+
* **Instant queries** – Evaluated at a single timestamp
139+
* **Range queries** – Evaluated at regular intervals over a defined time range
140+
141+
## Common Causes of Slow Queries and Optimization Techniques
142+
143+
Several factors influence the latency and resource usage of PromQL queries. This section highlights the key contributors and practical strategies for improving performance.
144+
145+
### Query Cardinality
146+
147+
High cardinality increases the number of time series that must be scanned and evaluated.
148+
149+
#### Recommendations
150+
151+
* Eliminate unnecessary labels from metrics.
152+
* Use selective label matchers to reduce the number of series returned.
153+
154+
### Number of samples processed
155+
156+
The number of samples fetched impacts both memory usage and CPU time for decoding and processing.
157+
158+
#### Recommendations
159+
160+
Until downsampling is implemented, reducing the scrape interval can help lower the amount of samples to be processed. But this comes at the cost of reduced resolution.
161+
162+
### Number of evaluation steps
163+
164+
The number of evaluation steps for a range query is computed as:
165+
166+
```
167+
num of steps = 1 + (end - start) / step
168+
```
169+
170+
**Example:** A 24-hour query with a 1-minute step results in 1,441 evaluation steps.
171+
172+
#### Recommendations
173+
174+
Grafana can automatically set the step size based on the time range. If a query is slow, manually increasing the step parameter can reduce computational overhead.
175+
176+
### Time range of the query
177+
178+
Wider time ranges amplify the effects of cardinality, sample volume, and evaluation steps.
179+
180+
#### Recommendations
181+
182+
* Use shorter time ranges (e.g., 1h) in dashboards.
183+
* Default to instant queries during metric exploration to reduce load.
184+
185+
### Query Complexity
186+
187+
Subqueries, nested expressions, and advanced functions may lead to substantial CPU consumption.
188+
189+
#### Recommendations
190+
191+
* Simplify complex expressions where feasible.
192+
193+
### Regular Expressions
194+
195+
While Prometheus has optimized regex matching, such queries remain CPU-intensive.
196+
197+
#### Recommendations
198+
199+
* Avoid regex matchers in high-frequency queries.
200+
* Where possible, use equality matchers instead.
201+
202+
### Query Result Size
203+
204+
Queries returning large datasets (>100MB) can incur significant serialization and network transfer costs.
205+
206+
#### Example
207+
208+
```
209+
pod_container_info #No aggregation
210+
sum by (pod) (rate(container_cpu_seconds_total[1m])) # High cardinality result
211+
```
212+
213+
#### Recommendations
214+
215+
* Scoping the query using additional label matchers reduces result size and improves performance.
216+
217+
## Summary
218+
219+
The key optimization techniques are:
220+
221+
* Use selective label matchers to limit cardinality.
222+
* Increase the step value in long-range queries.
223+
* Simplify complex or nested PromQL expressions.
224+
* Avoid regex matchers unless strictly necessary.
225+
* Favor instant queries for interactive use cases.
226+
* Scope queries to minimize the result size.
227+

0 commit comments

Comments
 (0)