Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix approx_percentile to have constant pct for input rows #24600

Merged
merged 1 commit into from
Feb 21, 2025

Conversation

yuandagits
Copy link
Contributor

Description

See full context here: #24592

Add validation to ensure that the percentile used in approx_percentile is constant between all rows.

Motivation and Context

approx_percentile was using the last percentile in the input rows used for aggregation; however, the percentile argument should be constant between the rows for the aggregation as per the Presto documentation.

See: #24592

Impact

Users who leverage this function may now see errors if they were passing in non constant percentiles between the rows. For example, the following query will now throw whereas before, it would use 0.9 as the percentile value.

SELECT 
  APPROX_PERCENTILE(score, pct) 
FROM 
  (
    SELECT 
      * 
    FROM 
      (
        VALUES 
          (0.01, 0.1), 
          (0.02, 0.5), 
          (0.03, 0.9)
      ) AS t(score, pct)
  )

Test Plan

Added unit test and also manual test via Presto-cli

presto:tpch> SELECT 
          ->   APPROX_PERCENTILE(score, pct) 
          -> FROM 
          ->   (
          ->     SELECT 
          ->       * 
          ->     FROM 
          ->       (
          ->         VALUES 
          ->           (0.01, 0.1), 
          ->           (0.02, 0.5), 
          ->           (0.03, 0.9)
          ->       ) AS t(score, pct)
          ->   );

Query 20250220_165550_00073_v3779, FAILED, 1 node, 6 splits
http://localhost:8080/ui/query.html?20250220_165550_00073_v3779
Splits:   6 queued, 0 running, 0 done
CPU Time: 0.0s total,     0 rows/s,     0B/s, 0% active
Per Node: 0.0 parallelism,     0 rows/s,     0B/s

Query 20250220_165550_00073_v3779, FAILED, 1 node
http://localhost:8080/ui/query.html?20250220_165550_00073_v3779
Splits: 6 total, 0 done (0.00%)
[Latency: client-side: 0:01, server-side: 0:01] [0 rows, 0B] [0 rows/s, 0B/s]

Query 20250220_165550_00073_v3779 failed: Percentile argument must be constant for all input rows: 0.5 vs. 0.1
com.facebook.presto.spi.PrestoException: Percentile argument must be constant for all input rows: 0.5 vs. 0.1
	at com.facebook.presto.util.Failures.checkCondition(Failures.java:89)
	at com.facebook.presto.operator.aggregation.ApproximateLongPercentileAggregations.checkPercentile(ApproximateLongPercentileAggregations.java:141)
	at com.facebook.presto.operator.aggregation.ApproximateLongPercentileAggregations.addInput(ApproximateLongPercentileAggregations.java:101)
	at com.facebook.presto.operator.aggregation.ApproximateLongPercentileAggregations.input(ApproximateLongPercentileAggregations.java:46)
	at com.facebook.presto.operator.aggregation.ApproximateRealPercentileAggregations.input(ApproximateRealPercentileAggregations.java:47)
	at com.facebook.presto.$gen.RealRealDoubleApproxPercentileAccumulator_20250220_165114_2423.addInput(Unknown Source)
	at com.facebook.presto.operator.Aggregator.processPage(Aggregator.java:58)
	at com.facebook.presto.operator.AggregationOperator.addInput(AggregationOperator.java:153)
	at com.facebook.presto.operator.Driver.processInternal(Driver.java:451)
	at com.facebook.presto.operator.Driver.lambda$processFor$10(Driver.java:324)
	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:750)
	at com.facebook.presto.operator.Driver.processFor(Driver.java:317)
	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1079)
	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:165)
	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:621)
	at com.facebook.presto.$gen.Presto_null__testversion____20250220_164542_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... 
* ... 

Hive Connector Changes
* ... 
* ... 

If release note is NOT required, use:

== NO RELEASE NOTE ==

@yuandagits yuandagits changed the title Fix approx_percentile to have constant pct for input rows Fix: approx_percentile to have constant pct for input rows Feb 20, 2025
@yuandagits yuandagits changed the title Fix: approx_percentile to have constant pct for input rows Fix approx_percentile to have constant pct for input rows Feb 20, 2025
@yuandagits
Copy link
Contributor Author

@yuandagits yuandagits marked this pull request as ready for review February 20, 2025 17:22
@yuandagits yuandagits requested a review from a team as a code owner February 20, 2025 17:22
@yuandagits yuandagits merged commit 0041846 into prestodb:master Feb 21, 2025
54 checks passed
@facebook-github-bot
Copy link
Collaborator

@yuandagits has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants