You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Test Scenario:
I send 8 queries, 4queries per group. Queries are send with 20 seconds delay. E.g. 2 queries run at same time one per group. Then 20 seconds delay and new 2 queries statred. etc.
One query takes about 100Gb memory
Scenarios with auto scalling
First scenario
There are 4 workers. I submit 8 queries. I got 2 queries running and 6 queries in queue.
Tirno workers increased to 10. So here I got more memory
6 queued queries become runnig
Queries finished
I replay scenario with queries submit but now with 10 workers at the begining
I expect that 6-8 queries will run but I got 2 queries running and 6 planned.
10 workers * 50000M = 488,28 Gb, 10% is 48Gb. So if one query take more all other must be queued. But why after autoscalling 6 queries become running? not one, not two, but all queued?
Second scenario
After I scaled cluster from 4 to 10 workers:
Trino UI shows 10 workers
I killed 6 workes and wait trino UI shows 4 nodes
k8s restored 6 workers - trino UI shown 10 nodes
I killed 6 workes again and k8s restored there workes.
Tirno UI shown 10 nodes
See image atttached. Pod count stay the same (I think k8s did not react to quick pod decreasing) But any way if pod count the same or less memory must be same or less but must not increase.
Expected result:
trino_memory_ClusterMemoryManager_ClusterMemoryBytes shows that there is ~500Gb
When I submit queries 2 queries running and 6 in queue
Actual result:
trino_memory_ClusterMemoryManager_ClusterMemoryBytes shows that there is ~1Tb. After about 30-40 minutes this metric rteturned to 500Gb. Again Trino UI shows only 10 wrokers
Resource groups allows to more then 2 queries running. Do not remeber well but ~6 quieries run.
My assumption that Resources groups looks to trino_memory_ClusterMemoryManager_ClusterMemoryBytes metrics and there 2 problems are related. But who knows?
The text was updated successfully, but these errors were encountered:
The configuration:
Resource group config:
Test Scenario:
I send 8 queries, 4queries per group. Queries are send with 20 seconds delay. E.g. 2 queries run at same time one per group. Then 20 seconds delay and new 2 queries statred. etc.
One query takes about 100Gb memory
Scenarios with auto scalling
First scenario
10 workers * 50000M = 488,28 Gb, 10% is 48Gb. So if one query take more all other must be queued. But why after autoscalling 6 queries become running? not one, not two, but all queued?
Second scenario
After I scaled cluster from 4 to 10 workers:
See image atttached. Pod count stay the same (I think k8s did not react to quick pod decreasing) But any way if pod count the same or less memory must be same or less but must not increase.
Expected result:
Actual result:
My assumption that Resources groups looks to
trino_memory_ClusterMemoryManager_ClusterMemoryBytes
metrics and there 2 problems are related. But who knows?The text was updated successfully, but these errors were encountered: