|
| 1 | +--- |
| 2 | +date: 2025-04-29 |
| 3 | +title: "Optimizing PromQL queries: A deep dive" |
| 4 | +linkTitle: Optimizing PromQL queries |
| 5 | +tags: [ "blog", "cortex", "query", "optimization" ] |
| 6 | +categories: [ "blog" ] |
| 7 | +projects: [ "cortex" ] |
| 8 | +description: > |
| 9 | + This guide explains how Cortex evaluates PromQL queries, details how time series data is stored and retrieved, and offers strategies to write performant queries—particularly in high-cardinality environments. |
| 10 | +author: Harry John ([@harry671003](https://github.com/harry671003)) |
| 11 | +--- |
| 12 | + |
| 13 | + |
| 14 | +## Introduction |
| 15 | + |
| 16 | +This guide explains how Cortex evaluates PromQL queries, details how time series data is stored and retrieved, and offers strategies to write performant queries — particularly in high-cardinality environments. |
| 17 | + |
| 18 | +Note: If you are new to PromQL, it is recommended to start with the [Querying basics documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/). |
| 19 | + |
| 20 | +## Prometheus Concepts |
| 21 | + |
| 22 | +### Data Model |
| 23 | + |
| 24 | +Prometheus employs a straightforward data model: |
| 25 | + |
| 26 | +* Each time series is uniquely identified by a metric name and a set of label-value pairs. |
| 27 | +* Each sample includes: |
| 28 | + * A millisecond precision timestamp |
| 29 | + * A 64 bit floating point value. |
| 30 | + |
| 31 | +### Label Matchers |
| 32 | + |
| 33 | +Label matchers define the selection criteria for time series within the TSDB. Consider the following PromQL expression: |
| 34 | + |
| 35 | +``` |
| 36 | +http_requests_total{cluster="prod", job="envoy"} |
| 37 | +``` |
| 38 | + |
| 39 | +the label matchers are: |
| 40 | + |
| 41 | +* `__name__="http_requests_total"` |
| 42 | +* `cluster="prod"` |
| 43 | +* `job="envoy"` |
| 44 | + |
| 45 | + |
| 46 | +Prometheus supports four types of label matchers: |
| 47 | + |
| 48 | +|Type |Syntax |Example | |
| 49 | +|--- |--- |--- | |
| 50 | +|Equal |label="value" |job="envoy" | |
| 51 | +|Not Equal |label!="value" |job!="prometheus" | |
| 52 | +|Regex Equal |label=~"regex" |job=~"env.*" | |
| 53 | +|Regex Not Equal |label!~"regex" |status!~"4.." | |
| 54 | + |
| 55 | +## Time Series Storage in Cortex |
| 56 | + |
| 57 | +Cortex uses Prometheus's Time Series Database (TSDB) for storing time series data. The Prometheus TSDB is time partitioned into blocks. Each TSDB block is made up of the following files: |
| 58 | + |
| 59 | +* `ID` - ID of the block ([ULID](https://github.com/ulid/spec)) |
| 60 | +* `meta.json` - Contains the metadata of the block |
| 61 | +* `index` - A binary file that contains the index |
| 62 | +* `chunks` - Directory containing the chunk segment files. |
| 63 | + |
| 64 | +More details: [TSDB format docs](https://github.com/prometheus/prometheus/blob/main/tsdb/docs/format/README.md) |
| 65 | + |
| 66 | +### Index File |
| 67 | + |
| 68 | +The `index` file contains two key mappings for query processing: |
| 69 | + |
| 70 | +* **Postings Offset Table and Postings**: Maps label-value pairs to Series IDs |
| 71 | +* **Series Section**: Maps series IDs to label sets and chunk references |
| 72 | + |
| 73 | +#### Example |
| 74 | + |
| 75 | +Given the following time series: |
| 76 | + |
| 77 | +``` |
| 78 | +http_requests_total{cluster="prod", job="envoy", status="200"} -> SeriesID(1) |
| 79 | +http_requests_total{cluster="prod", job="envoy", status="400"} -> SeriesID(2) |
| 80 | +http_requests_total{cluster="prod", job="envoy", status="500"} -> SeriesID(3) |
| 81 | +http_requests_total{cluster="prod", job="prometheus", status="200"} -> SeriesID(4) |
| 82 | +``` |
| 83 | + |
| 84 | +The index file would store mappings such as: |
| 85 | + |
| 86 | +``` |
| 87 | +__name__=http_requests_total → [1, 2, 3, 4] |
| 88 | +cluster=prod → [1, 2, 3, 4] |
| 89 | +job=envoy → [1, 2, 3] |
| 90 | +job=prometheus → [4] |
| 91 | +status=200 → [1, 4] |
| 92 | +status=400 → [2] |
| 93 | +status=500 → [3] |
| 94 | +``` |
| 95 | + |
| 96 | +### Chunks |
| 97 | + |
| 98 | +Each chunk segment file can store up to **512MB** of data. Each chunk in the segment file typically holds up to **120 samples**. |
| 99 | + |
| 100 | +## Query Execution in Cortex |
| 101 | + |
| 102 | +To optimize PromQL queries effectively, it is essential to understand how queries are executed within Cortex. Consider the following example: |
| 103 | + |
| 104 | +``` |
| 105 | +sum(rate(http_requests_total{cluster="prod", job="envoy"}[5m])) |
| 106 | +``` |
| 107 | + |
| 108 | +### Block Selection |
| 109 | + |
| 110 | +Cortex first identifies the TSDB blocks that fall within the query’s time range. This process is very fast in Cortex and will not add a huge overhead on query execution. |
| 111 | + |
| 112 | +### Series Selection |
| 113 | + |
| 114 | +Next, Cortex uses the inverted index to retrieve the set of matching series IDs for each label matcher. For example: |
| 115 | + |
| 116 | +``` |
| 117 | +__name__="http_requests_total" → [1, 2, 3, 4] |
| 118 | +cluster="prod" → [1, 2, 3, 4] |
| 119 | +job="envoy" → [1, 2, 3] |
| 120 | +``` |
| 121 | + |
| 122 | +The intersection of these sets yields: |
| 123 | + |
| 124 | +``` |
| 125 | +http_requests_total{cluster=“prod”, job=“envoy”, status=“200”} |
| 126 | +http_requests_total{cluster=“prod”, job=“envoy”, status=“400”} |
| 127 | +http_requests_total{cluster=“prod”, job=“envoy”, status=“500”} |
| 128 | +``` |
| 129 | + |
| 130 | +### Sample Selection |
| 131 | + |
| 132 | +The mapping from series to chunks is used to identify the relevant chunks from the chunk segment files. These chunks are decoded to retrieve the underlying time series samples. |
| 133 | + |
| 134 | +### PromQL evaluation |
| 135 | + |
| 136 | +Using the retrieved series and samples, the PromQL engine evaluates the query. There are two modes of running queries: |
| 137 | + |
| 138 | +* **Instant queries** – Evaluated at a single timestamp |
| 139 | +* **Range queries** – Evaluated at regular intervals over a defined time range |
| 140 | + |
| 141 | +## Common Causes of Slow Queries and Optimization Techniques |
| 142 | + |
| 143 | +Several factors influence the latency and resource usage of PromQL queries. This section highlights the key contributors and practical strategies for improving performance. |
| 144 | + |
| 145 | +### Query Cardinality |
| 146 | + |
| 147 | +High cardinality increases the number of time series that must be scanned and evaluated. |
| 148 | + |
| 149 | +#### Recommendations |
| 150 | + |
| 151 | +* Eliminate unnecessary labels from metrics. |
| 152 | +* Use selective label matchers to reduce the number of series returned. |
| 153 | + |
| 154 | +### Number of samples processed |
| 155 | + |
| 156 | +The number of samples fetched impacts both memory usage and CPU time for decoding and processing. |
| 157 | + |
| 158 | +#### Recommendations |
| 159 | + |
| 160 | +Until downsampling is implemented, reducing the scrape interval can help lower the amount of samples to be processed. But this comes at the cost of reduced resolution. |
| 161 | + |
| 162 | +### Number of evaluation steps |
| 163 | + |
| 164 | +The number of evaluation steps for a range query is computed as: |
| 165 | + |
| 166 | +``` |
| 167 | +num of steps = 1 + (end - start) / step |
| 168 | +``` |
| 169 | + |
| 170 | +**Example:** A 24-hour query with a 1-minute step results in 1,441 evaluation steps. |
| 171 | + |
| 172 | +#### Recommendations |
| 173 | + |
| 174 | +Grafana can automatically set the step size based on the time range. If a query is slow, manually increasing the step parameter can reduce computational overhead. |
| 175 | + |
| 176 | +### Time range of the query |
| 177 | + |
| 178 | +Wider time ranges amplify the effects of cardinality, sample volume, and evaluation steps. |
| 179 | + |
| 180 | +#### Recommendations |
| 181 | + |
| 182 | +* Use shorter time ranges (e.g., 1h) in dashboards. |
| 183 | +* Default to instant queries during metric exploration to reduce load. |
| 184 | + |
| 185 | +### Query Complexity |
| 186 | + |
| 187 | +Subqueries, nested expressions, and advanced functions may lead to substantial CPU consumption. |
| 188 | + |
| 189 | +#### Recommendations |
| 190 | + |
| 191 | +* Simplify complex expressions where feasible. |
| 192 | + |
| 193 | +### Regular Expressions |
| 194 | + |
| 195 | +While Prometheus has optimized regex matching, such queries remain CPU-intensive. |
| 196 | + |
| 197 | +#### Recommendations |
| 198 | + |
| 199 | +* Avoid regex matchers in high-frequency queries. |
| 200 | +* Where possible, use equality matchers instead. |
| 201 | + |
| 202 | +### Query Result Size |
| 203 | + |
| 204 | +Queries returning large datasets (>100MB) can incur significant serialization and network transfer costs. |
| 205 | + |
| 206 | +#### Example |
| 207 | + |
| 208 | +``` |
| 209 | +pod_container_info #No aggregation |
| 210 | +sum by (pod) (rate(container_cpu_seconds_total[1m])) # High cardinality result |
| 211 | +``` |
| 212 | + |
| 213 | +#### Recommendations |
| 214 | + |
| 215 | +* Scoping the query using additional label matchers reduces result size and improves performance. |
| 216 | + |
| 217 | +## Summary |
| 218 | + |
| 219 | +The key optimization techniques are: |
| 220 | + |
| 221 | +* Use selective label matchers to limit cardinality. |
| 222 | +* Increase the step value in long-range queries. |
| 223 | +* Simplify complex or nested PromQL expressions. |
| 224 | +* Avoid regex matchers unless strictly necessary. |
| 225 | +* Favor instant queries for interactive use cases. |
| 226 | +* Scope queries to minimize the result size. |
| 227 | + |
0 commit comments