Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve inefficient plan in queries with window functions using UNBOUNDED FOLLOWING #24501

Open
psantos-denodo opened this issue Feb 5, 2025 · 0 comments

Comments

@psantos-denodo
Copy link

It looks there is an inneficiency on the execution of window functions containing UNBOUNDED FOLLOWING that does not happen using equivalent queries using either a big but constant value instead of UNBOUNDED or reversing the sort and using UNBOUNDED PRECEDING.

Summary of execution time differences (local Presto with just one worker):

  • Query using ORDER DESC + UNBOUNDED FOLLOWING : 49.67s
  • Query using ORDER DESC + 2147483647 FOLLOWING : 14.85s
  • Query using ORDER ASC + UNBOUNDED PRECEDING : 3.45s

Queries:

-- Query using ORDER DESC + UNBOUNDED FOLLOWING : 49.67s

SELECT suppkey, name, nationkey,
    SUM(acctbal) OVER (
      PARTITION BY nationkey
      ORDER BY cast (replace(phone,'-','') as bigint) **DESC**
      RANGE BETWEEN 2000 PRECEDING AND **UNBOUNDED FOLLOWING**
    )
FROM tpch.sf100.supplier limit 100

EXPLAIN VERBOSE

- Output[PlanNodeId 11][suppkey, name, nationkey, _col3] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} _col3 := sum (2:5) - Limit[PlanNodeId 350][100] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - LocalExchange[PlanNodeId 534][SINGLE] () => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - RemoteStreamingExchange[PlanNodeId 487][GATHER] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - LimitPartial[PlanNodeId 486][100] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - Project[PlanNodeId 575][projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, sum:double] - Window[PlanNodeId 5][partition by (nationkey), order by (expr DESC_NULLS_LAST)][$hashvalue] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue:bigint, sum:double] sum := sum(acctbal) RANGE 2000 PRECEDING UNBOUNDED_FOLLOWING (2:5) - LocalExchange[PlanNodeId 533][HASH][$hashvalue] (nationkey) => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 369,000,000.00, memory: 0.00, network: 77,000,000.00} - RemoteStreamingExchange[PlanNodeId 485][REPARTITION][$hashvalue_20] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue_20:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 292,000,000.00, memory: 0.00, network: 77,000,000.00} - Project[PlanNodeId 150][projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue_21:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 215,000,000.00, memory: 0.00, network: 0.00} $operator$add := (expr) + (BIGINT'2000') - ScanProject[PlanNodeId 0,1][table = TableHandle {connectorId='tpch', connectorHandle='supplier:sf100.0', layout='Optional[supplier:sf100.0]'}, projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $hashvalue_21:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 70,000,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 138,000,000.00, memory: 0.00, network: 0.00} expr := CAST(replace(phone, VARCHAR'-', VARCHAR'') AS bigint) (7:7) $hashvalue_21 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(nationkey), BIGINT'0')) (7:7) acctbal := tpch:acctbal (7:6) name := tpch:name (7:6) nationkey := tpch:nationkey (7:6) phone := tpch:phone (7:6) suppkey := tpch:suppkey (7:6)

-- Query using ORDER DESC + 2147483647 FOLLOWING : 14.85s

SELECT suppkey, name, nationkey,
    SUM(acctbal) OVER (
      PARTITION BY nationkey
      ORDER BY cast (replace(phone,'-','') as bigint) DESC
      RANGE BETWEEN 2000 PRECEDING AND 2147483647 FOLLOWING
    )
FROM tpch.sf100.supplier limit 100

EXPLAIN VERBOSE

- Output[PlanNodeId 13][suppkey, name, nationkey, _col3] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} _col3 := sum (2:5) - Limit[PlanNodeId 425][100] => [name:varchar(25), suppkey:bigint, nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - LocalExchange[PlanNodeId 609][SINGLE] () => [name:varchar(25), suppkey:bigint, nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - RemoteStreamingExchange[PlanNodeId 562][GATHER] => [name:varchar(25), suppkey:bigint, nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - LimitPartial[PlanNodeId 561][100] => [expr:bigint, name:varchar(25), suppkey:bigint, $operator$add:bigint, $operator$subtract:bigint, acctbal:double, nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - Project[PlanNodeId 650][projectLocality = LOCAL] => [expr:bigint, name:varchar(25), suppkey:bigint, $operator$add:bigint, $operator$subtract:bigint, acctbal:double, nationkey:bigint, sum:double] - Window[PlanNodeId 7][partition by (nationkey), order by (expr DESC_NULLS_LAST)][$hashvalue] => [expr:bigint, name:varchar(25), suppkey:bigint, $operator$add:bigint, $operator$subtract:bigint, acctbal:double, nationkey:bigint, $hashvalue:bigint, sum:double] sum := sum(acctbal) RANGE 2000 PRECEDING 2147483647 FOLLOWING (2:5) - LocalExchange[PlanNodeId 608][HASH][$hashvalue] (nationkey) => [expr:bigint, name:varchar(25), suppkey:bigint, $operator$add:bigint, $operator$subtract:bigint, acctbal:double, nationkey:bigint, $hashvalue:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 396,000,000.00, memory: 0.00, network: 86,000,000.00} - RemoteStreamingExchange[PlanNodeId 560][REPARTITION][$hashvalue_22] => [expr:bigint, name:varchar(25), suppkey:bigint, $operator$add:bigint, $operator$subtract:bigint, acctbal:double, nationkey:bigint, $hashvalue_22:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 310,000,000.00, memory: 0.00, network: 86,000,000.00} - Project[PlanNodeId 184][projectLocality = LOCAL] => [expr:bigint, name:varchar(25), suppkey:bigint, $operator$add:bigint, $operator$subtract:bigint, acctbal:double, nationkey:bigint, $hashvalue_23:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 224,000,000.00, memory: 0.00, network: 0.00} $operator$add := (expr) + (BIGINT'2000') $operator$subtract := (expr) - (BIGINT'2147483647') - ScanProject[PlanNodeId 0,1][table = TableHandle {connectorId='tpch', connectorHandle='supplier:sf100.0', layout='Optional[supplier:sf100.0]'}, projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $hashvalue_23:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 70,000,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 138,000,000.00, memory: 0.00, network: 0.00} expr := CAST(replace(phone, VARCHAR'-', VARCHAR'') AS bigint) (7:7) $hashvalue_23 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(nationkey), BIGINT'0')) (7:7) acctbal := tpch:acctbal (7:6) name := tpch:name (7:6) nationkey := tpch:nationkey (7:6) phone := tpch:phone (7:6) suppkey := tpch:suppkey (7:6)

-- Query using ORDER ASC + UNBOUNDED PRECEDING : 3.45s

SELECT suppkey, name, nationkey,
    SUM(acctbal) OVER (
      PARTITION BY nationkey
      ORDER BY cast (replace(phone,'-','') as bigint) ASC
      RANGE BETWEEN UNBOUNDED PRECEDING AND 2000  FOLLOWING
    )
FROM tpch.sf100.supplier limit 100

EXPLAIN VERBOSE
- Output[PlanNodeId 11][suppkey, name, nationkey, _col3] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} _col3 := sum (2:5) - Limit[PlanNodeId 350][100] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - LocalExchange[PlanNodeId 534][SINGLE] () => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - RemoteStreamingExchange[PlanNodeId 487][GATHER] => [suppkey:bigint, name:varchar(25), nationkey:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - LimitPartial[PlanNodeId 486][100] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, sum:double] Estimates: {source: CostBasedSourceInfo, rows: 100 (8.01kB), cpu: ?, memory: ?, network: ?} - Project[PlanNodeId 575][projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, sum:double] - Window[PlanNodeId 5][partition by (nationkey), order by (expr ASC_NULLS_LAST)][$hashvalue] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue:bigint, sum:double] sum := sum(acctbal) RANGE UNBOUNDED_PRECEDING 2000 FOLLOWING (2:5) - LocalExchange[PlanNodeId 533][HASH][$hashvalue] (nationkey) => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 369,000,000.00, memory: 0.00, network: 77,000,000.00} - RemoteStreamingExchange[PlanNodeId 485][REPARTITION][$hashvalue_20] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue_20:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 292,000,000.00, memory: 0.00, network: 77,000,000.00} - Project[PlanNodeId 150][projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $operator$add:bigint, $hashvalue_21:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 215,000,000.00, memory: 0.00, network: 0.00} $operator$add := (expr) + (BIGINT'2000') - ScanProject[PlanNodeId 0,1][table = TableHandle {connectorId='tpch', connectorHandle='supplier:sf100.0', layout='Optional[supplier:sf100.0]'}, projectLocality = LOCAL] => [suppkey:bigint, name:varchar(25), acctbal:double, nationkey:bigint, expr:bigint, $hashvalue_21:bigint] Estimates: {source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 70,000,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 1,000,000 (47.68MB), cpu: 138,000,000.00, memory: 0.00, network: 0.00} expr := CAST(replace(phone, VARCHAR'-', VARCHAR'') AS bigint) (7:7) $hashvalue_21 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(nationkey), BIGINT'0')) (7:7) acctbal := tpch:acctbal (7:6) name := tpch:name (7:6) nationkey := tpch:nationkey (7:6) phone := tpch:phone (7:6) suppkey := tpch:suppkey (7:6)

Expected Behavior or Use Case

It looks that it should be possible to apply an optimization on this particular case

Context

Find it during testing, it is not currently affecting a real environment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant