You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It looks there is an inneficiency on the execution of window functions containing UNBOUNDED FOLLOWING that does not happen using equivalent queries using either a big but constant value instead of UNBOUNDED or reversing the sort and using UNBOUNDED PRECEDING.
Summary of execution time differences (local Presto with just one worker):
Query using ORDER DESC + UNBOUNDED FOLLOWING : 49.67s
Query using ORDER DESC + 2147483647 FOLLOWING : 14.85s
Query using ORDER ASC + UNBOUNDED PRECEDING : 3.45s
Queries:
-- Query using ORDER DESC + UNBOUNDED FOLLOWING : 49.67s
SELECT suppkey, name, nationkey,
SUM(acctbal) OVER (
PARTITION BY nationkey
ORDER BY cast (replace(phone,'-','') as bigint) **DESC**
RANGE BETWEEN 2000 PRECEDING AND **UNBOUNDED FOLLOWING**
)
FROM tpch.sf100.supplier limit 100
-- Query using ORDER DESC + 2147483647 FOLLOWING : 14.85s
SELECT suppkey, name, nationkey,
SUM(acctbal) OVER (
PARTITION BY nationkey
ORDER BY cast (replace(phone,'-','') as bigint) DESC
RANGE BETWEEN 2000 PRECEDING AND 2147483647 FOLLOWING
)
FROM tpch.sf100.supplier limit 100
-- Query using ORDER ASC + UNBOUNDED PRECEDING : 3.45s
SELECT suppkey, name, nationkey,
SUM(acctbal) OVER (
PARTITION BY nationkey
ORDER BY cast (replace(phone,'-','') as bigint) ASC
RANGE BETWEEN UNBOUNDED PRECEDING AND 2000 FOLLOWING
)
FROM tpch.sf100.supplier limit 100
It looks there is an inneficiency on the execution of window functions containing UNBOUNDED FOLLOWING that does not happen using equivalent queries using either a big but constant value instead of UNBOUNDED or reversing the sort and using UNBOUNDED PRECEDING.
Summary of execution time differences (local Presto with just one worker):
Queries:
-- Query using ORDER DESC + UNBOUNDED FOLLOWING : 49.67s
EXPLAIN VERBOSE
- Output[PlanNodeId 11][suppkey, name, nationkey, _col3] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} _col3 := sum (2:5) - Limit[PlanNodeId 350][100] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - LocalExchange[PlanNodeId 534][SINGLE] () => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - RemoteStreamingExchange[PlanNodeId 487][GATHER] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - LimitPartial[PlanNodeId 486][100] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - Project[PlanNodeId 575][projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, sum:double] - Window[PlanNodeId 5][partition by (nationkey), order by (expr DESC_NULLS_LAST)][$hashvalue] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue:bigint, sum:double] sum := sum(acctbal) RANGE 2000 PRECEDING UNBOUNDED_FOLLOWING (2:5) - LocalExchange[PlanNodeId 533][HASH][$hashvalue] (nationkey) => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 369,000,000.00, memory: 0.00, network: 77,000,000.00} - RemoteStreamingExchange[PlanNodeId 485][REPARTITION][$hashvalue_20] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue_20:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 292,000,000.00, memory: 0.00, network: 77,000,000.00} - Project[PlanNodeId 150][projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue_21:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 215,000,000.00, memory: 0.00, network: 0.00} $operator$add := (expr) + (BIGINT'2000') - ScanProject[PlanNodeId 0,1][table = TableHandle {connectorId='tpch', connectorHandle='supplier:sf100.0', layout='Optional[supplier:sf100.0]'}, projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $hashvalue_21:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 70,000,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 138,000,000.00, memory: 0.00, network: 0.00} expr := CAST(replace(phone, VARCHAR'-', VARCHAR'') AS bigint) (7:7) $hashvalue_21 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(nationkey), BIGINT'0')) (7:7) acctbal := tpch:acctbal (7:6) name := tpch:name (7:6) nationkey := tpch:nationkey (7:6) phone := tpch:phone (7:6) suppkey := tpch:suppkey (7:6)
-- Query using ORDER DESC + 2147483647 FOLLOWING : 14.85s
EXPLAIN VERBOSE
- Output[PlanNodeId 13][suppkey, name, nationkey, _col3] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} _col3 := sum (2:5) - Limit[PlanNodeId 425][100] => [name:varchar(25), suppkey:bigint, nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - LocalExchange[PlanNodeId 609][SINGLE] () => [name:varchar(25), suppkey:bigint, nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - RemoteStreamingExchange[PlanNodeId 562][GATHER] => [name:varchar(25), suppkey:bigint, nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - LimitPartial[PlanNodeId 561][100] => [expr:bigint, name:varchar(25), suppkey:bigint, $operator$add:bigint, $operator$subtract:bigint, acctbal:double, nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - Project[PlanNodeId 650][projectLocality = LOCAL] => [expr:bigint, name:varchar(25), suppkey:bigint, $operator$add:bigint, $operator$subtract:bigint, acctbal:double, nationkey:bigint, sum:double] - Window[PlanNodeId 7][partition by (nationkey), order by (expr DESC_NULLS_LAST)][$hashvalue] => [expr:bigint, name:varchar(25), suppkey:bigint, $operator$add:bigint, $operator$subtract:bigint, acctbal:double, nationkey:bigint, $hashvalue:bigint, sum:double] sum := sum(acctbal) RANGE 2000 PRECEDING 2147483647 FOLLOWING (2:5) - LocalExchange[PlanNodeId 608][HASH][$hashvalue] (nationkey) => [expr:bigint, name:varchar(25), suppkey:bigint, $operator$add:bigint, $operator$subtract:bigint, acctbal:double, nationkey:bigint, $hashvalue:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 396,000,000.00, memory: 0.00, network: 86,000,000.00} - RemoteStreamingExchange[PlanNodeId 560][REPARTITION][$hashvalue_22] => [expr:bigint, name:varchar(25), suppkey:bigint, $operator$add:bigint, $operator$subtract:bigint, acctbal:double, nationkey:bigint, $hashvalue_22:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 310,000,000.00, memory: 0.00, network: 86,000,000.00} - Project[PlanNodeId 184][projectLocality = LOCAL] => [expr:bigint, name:varchar(25), suppkey:bigint, $operator$add:bigint, $operator$subtract:bigint, acctbal:double, nationkey:bigint, $hashvalue_23:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 224,000,000.00, memory: 0.00, network: 0.00} $operator$add := (expr) + (BIGINT'2000') $operator$subtract := (expr) - (BIGINT'2147483647') - ScanProject[PlanNodeId 0,1][table = TableHandle {connectorId='tpch', connectorHandle='supplier:sf100.0', layout='Optional[supplier:sf100.0]'}, projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $hashvalue_23:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 70,000,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 138,000,000.00, memory: 0.00, network: 0.00} expr := CAST(replace(phone, VARCHAR'-', VARCHAR'') AS bigint) (7:7) $hashvalue_23 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(nationkey), BIGINT'0')) (7:7) acctbal := tpch:acctbal (7:6) name := tpch:name (7:6) nationkey := tpch:nationkey (7:6) phone := tpch:phone (7:6) suppkey := tpch:suppkey (7:6)
-- Query using ORDER ASC + UNBOUNDED PRECEDING : 3.45s
EXPLAIN VERBOSE
- Output[PlanNodeId 11][suppkey, name, nationkey, _col3] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} _col3 := sum (2:5) - Limit[PlanNodeId 350][100] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - LocalExchange[PlanNodeId 534][SINGLE] () => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - RemoteStreamingExchange[PlanNodeId 487][GATHER] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - LimitPartial[PlanNodeId 486][100] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - Project[PlanNodeId 575][projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, sum:double] - Window[PlanNodeId 5][partition by (nationkey), order by (expr ASC_NULLS_LAST)][$hashvalue] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue:bigint, sum:double] sum := sum(acctbal) RANGE UNBOUNDED_PRECEDING 2000 FOLLOWING (2:5) - LocalExchange[PlanNodeId 533][HASH][$hashvalue] (nationkey) => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 369,000,000.00, memory: 0.00, network: 77,000,000.00} - RemoteStreamingExchange[PlanNodeId 485][REPARTITION][$hashvalue_20] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue_20:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 292,000,000.00, memory: 0.00, network: 77,000,000.00} - Project[PlanNodeId 150][projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue_21:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 215,000,000.00, memory: 0.00, network: 0.00} $operator$add := (expr) + (BIGINT'2000') - ScanProject[PlanNodeId 0,1][table = TableHandle {connectorId='tpch', connectorHandle='supplier:sf100.0', layout='Optional[supplier:sf100.0]'}, projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $hashvalue_21:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 70,000,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 138,000,000.00, memory: 0.00, network: 0.00} expr := CAST(replace(phone, VARCHAR'-', VARCHAR'') AS bigint) (7:7) $hashvalue_21 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(nationkey), BIGINT'0')) (7:7) acctbal := tpch:acctbal (7:6) name := tpch:name (7:6) nationkey := tpch:nationkey (7:6) phone := tpch:phone (7:6) suppkey := tpch:suppkey (7:6)
Expected Behavior or Use Case
It looks that it should be possible to apply an optimization on this particular case
Context
Find it during testing, it is not currently affecting a real environment
The text was updated successfully, but these errors were encountered: