-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-52240][SQL] Fix VectorizedDeltaLengthByteArrayReader.readBinary to handle current row #50966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…andle current row
PTAL @sunchao @parthchandra |
Same fix but was not lucky to get reviewed. #46928 |
@pan3793 Thanks for letting me know! So is it a good time to revive that one (because it has good test)? |
I'm very happy to wait for it. |
Thank you for making a PR and pinging me, @wgtmac . Do you think we can add a test case for that dirty read, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @wgtmac !
Thanks @dongjoon-hyun and @sunchao! I think #46928 is a good fix with test cases. Does it make sense to merge that? (I haven't been working on Spark for a long time so it may take me more time to setup the test cases) |
cc @djspiewak from #46928 as the |
I'll wait for a day or two to see if the original author will work on the master branch. |
I'm happy to retarget master or do whatever is deemed best. Let me know your preferences. I'll reply to the test case question on the other PR to avoid confusing the threads. |
lgtm. Thanks for the fix @wgtmac, @djspiewak |
Thank you, @djspiewak . We always prefer the original one (yours in this case). |
Please open a new one based on the master branch and ping us, @djspiewak . |
Wow, what a surprise to see you here @djspiewak! We were in the same John Boyland’s type theory class 10+ years ago 😂 |
Small world! :D |
Cross-posting: I've updated the other PR and retargeted on master |
What changes were proposed in this pull request?
Fix VectorizedDeltaLengthByteArrayReader.readBinary to use
currentRow
to read the length of binary.Why are the changes needed?
VectorizedDeltaLengthByteArrayReader.readBinary should use
currentRow
instead ofrowId
to read from the internal data stream. The variablerowId
is the destination of the output column vector not the position to read binary data from input. This causes dirty read and throws ParquetDecodingException.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass all existing test cases.
Was this patch authored or co-authored using generative AI tooling?
No.