Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array_top_n: Nested calls fail on depths greater than 5 #24560

Open
peterenescu opened this issue Feb 13, 2025 · 5 comments
Open

array_top_n: Nested calls fail on depths greater than 5 #24560

peterenescu opened this issue Feb 13, 2025 · 5 comments
Labels

Comments

@peterenescu
Copy link

TL;DR: array_top_n nested calls fail on depths greater than 5.

Your Environment

  • Presto version used: 0.289
  • Deployment (Cloud or On-prem): On-prem

Expected Behavior

Function array_top_n should accept nesting to deeper levels. Presto on Velox accepts depths of at least 7

Current Behavior

Query fails due to JVM bytecode limitations on nested depth greater than 5.

presto> SELECT array_top_n(array_top_n(array_top_n(array_top_n(array_top_n(array[1,2,3], 1), 1), 1), 1), 1);
 _col0 
-------
 [3]   
(1 row)

Query ... FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
[Latency: client-side: 0:04, server-side: 0:04] [0 rows, 0B] [0 rows/s, 0B/s]

presto> SELECT array_top_n(array_top_n(array_top_n(array_top_n(array_top_n(array_top_n(array[1,2,3], 1), 1), 1), 1), 1), 1);
Query 20250213_232057_61866_2pbc7 failed: Query results in large bytecode exceeding the limits imposed by JVM
@spershin
Copy link
Contributor

This is probably because Presto Java compiles expression code or calls and this particular nested calls cause some code explosion?
This article talks about 64KB limit for method size: https://www.baeldung.com/java-code-too-large-error

@kgpai
Copy link
Contributor

kgpai commented Feb 14, 2025

@amitkdutta
Copy link
Contributor

@rschlussel
Copy link
Contributor

yeah, this function produces very long sql when it does the rewrite, so too many levels of nesting of this function results in too much bytecode. I don't think it's worth the investment to improve this in Java.

@spershin
Copy link
Contributor

Interesting, thanks @kgpai for pointing out the rewrite code.
I'm seeing this:

"RETURN IF(n < 0, fail('Parameter n: ' || cast(n as varchar) || ' to ARRAY_TOP_N is negative'), SLICE(ARRAY_SORT_DESC(input), 1, n))"
And I'm wondering if the check for negative N is even needed: N is simply passed to SLICE() and SLICE() treats negative N as 'starting from the end', meaning it would return the "BOTTOM N" in case N < 0.

In my opinion it is a valid "upgrade" to ARRAY_TOP_N() :)

However, it does change behavior and we might not want it.

We then could introduce an internal-only SLICE_NON_NEGATIVE() function, which we can use in rewrite without IF() statement and which will differ from SLICE by only checking that N is not negative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

5 participants