Strange Resource group behaviour and Cluster memory metrics #25226

nikita-sheremet-java-developer · 2025-03-05T19:13:04Z

The configuration:

Trino coordinator, 16 CPU, maxHeapSize 63000M
Trino worker 16 CPU, maxHeapSize 50000M

Resource group config:

{
  "rootGroups": [
    {
      "name": "global",
      "schedulingPolicy": "fair",
      "hardConcurrencyLimit": 1000,
      "softMemoryLimit": "100%",
      "maxQueued": 20,
      "jmxExport": true,
      "subGroups": [
        {
          "name": "common",
          "hardConcurrencyLimit": 100,
          "maxQueued": 100,
          "softMemoryLimit": "40%",
          "jmxExport": true,
          "subGroups": [
            {
              "name": "etl",
              "hardConcurrencyLimit": 100,
              "maxQueued": 100,
              "softMemoryLimit": "10%",
              "jmxExport": true
            },
            {
              "name": "analytics",
              "hardConcurrencyLimit": 100,
              "maxQueued": 100,
              "softMemoryLimit": "10%",
              "jmxExport": true
            }
          ]
        }
      ]
    }
  ]
}

Test Scenario:
I send 8 queries, 4queries per group. Queries are send with 20 seconds delay. E.g. 2 queries run at same time one per group. Then 20 seconds delay and new 2 queries statred. etc.
One query takes about 100Gb memory

Scenarios with auto scalling

First scenario

There are 4 workers. I submit 8 queries. I got 2 queries running and 6 queries in queue.
Tirno workers increased to 10. So here I got more memory
6 queued queries become runnig
Queries finished
I replay scenario with queries submit but now with 10 workers at the begining
I expect that 6-8 queries will run but I got 2 queries running and 6 planned.

10 workers * 50000M = 488,28 Gb, 10% is 48Gb. So if one query take more all other must be queued. But why after autoscalling 6 queries become running? not one, not two, but all queued?

Second scenario
After I scaled cluster from 4 to 10 workers:

Trino UI shows 10 workers
I killed 6 workes and wait trino UI shows 4 nodes
k8s restored 6 workers - trino UI shown 10 nodes
I killed 6 workes again and k8s restored there workes.
Tirno UI shown 10 nodes

See image atttached. Pod count stay the same (I think k8s did not react to quick pod decreasing) But any way if pod count the same or less memory must be same or less but must not increase.

Expected result:

trino_memory_ClusterMemoryManager_ClusterMemoryBytes shows that there is ~500Gb
When I submit queries 2 queries running and 6 in queue

Actual result:

trino_memory_ClusterMemoryManager_ClusterMemoryBytes shows that there is ~1Tb. After about 30-40 minutes this metric rteturned to 500Gb. Again Trino UI shows only 10 wrokers
Resource groups allows to more then 2 queries running. Do not remeber well but ~6 quieries run.

My assumption that Resources groups looks to trino_memory_ClusterMemoryManager_ClusterMemoryBytes metrics and there 2 problems are related. But who knows?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange Resource group behaviour and Cluster memory metrics #25226

Strange Resource group behaviour and Cluster memory metrics #25226

nikita-sheremet-java-developer commented Mar 5, 2025

Strange Resource group behaviour and Cluster memory metrics #25226

Strange Resource group behaviour and Cluster memory metrics #25226

Comments

nikita-sheremet-java-developer commented Mar 5, 2025