Skip to content

[SPARK-52240][SQL] Fix VectorizedDeltaLengthByteArrayReader.readBinary to handle current row #50966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wgtmac
Copy link
Member

@wgtmac wgtmac commented May 21, 2025

What changes were proposed in this pull request?

Fix VectorizedDeltaLengthByteArrayReader.readBinary to use currentRow to read the length of binary.

Why are the changes needed?

VectorizedDeltaLengthByteArrayReader.readBinary should use currentRow instead of rowId to read from the internal data stream. The variable rowId is the destination of the output column vector not the position to read binary data from input. This causes dirty read and throws ParquetDecodingException.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass all existing test cases.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label May 21, 2025
@wgtmac
Copy link
Member Author

wgtmac commented May 21, 2025

PTAL @sunchao @parthchandra

@pan3793
Copy link
Member

pan3793 commented May 21, 2025

Same fix but was not lucky to get reviewed. #46928

@wgtmac
Copy link
Member Author

wgtmac commented May 21, 2025

@pan3793 Thanks for letting me know! So is it a good time to revive that one (because it has good test)?

@pan3793
Copy link
Member

pan3793 commented May 21, 2025

@wgtmac #46928 gets reopened now, if you don't mind, we can wait for responses(maybe one or two days) from the original author of #46928, and see if he still wants to move the PR forward.

@wgtmac
Copy link
Member Author

wgtmac commented May 21, 2025

I'm very happy to wait for it.

cc @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

Thank you for making a PR and pinging me, @wgtmac . Do you think we can add a test case for that dirty read, ParquetDecodingException?

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @wgtmac !

@sunchao sunchao changed the title [SPARK-52240][Core] Fix VectorizedDeltaLengthByteArrayReader.readBinary to handle current row [SPARK-52240][SQL] Fix VectorizedDeltaLengthByteArrayReader.readBinary to handle current row May 21, 2025
@wgtmac
Copy link
Member Author

wgtmac commented May 21, 2025

Thanks @dongjoon-hyun and @sunchao! I think #46928 is a good fix with test cases. Does it make sense to merge that? (I haven't been working on Spark for a long time so it may take me more time to setup the test cases)

@dongjoon-hyun
Copy link
Member

To all, we cannot merge #46928 because it's opened for branch-3.5. We need to merge yours.

Given that, @wgtmac , please add a commit with #46928 's test case with the authorship of him.

Then, I can merge with the script with both author.

@dongjoon-hyun
Copy link
Member

cc @djspiewak from #46928 as the branch-3.5 PR author, too.

@wgtmac
Copy link
Member Author

wgtmac commented May 21, 2025

I'll wait for a day or two to see if the original author will work on the master branch.

@djspiewak
Copy link

I'm happy to retarget master or do whatever is deemed best. Let me know your preferences. I'll reply to the test case question on the other PR to avoid confusing the threads.

@parthchandra
Copy link
Contributor

lgtm. Thanks for the fix @wgtmac, @djspiewak

@dongjoon-hyun
Copy link
Member

I'm happy to retarget master or do whatever is deemed best. Let me know your preferences. I'll reply to the test case question on the other PR to avoid confusing the threads.

Thank you, @djspiewak . We always prefer the original one (yours in this case).

@dongjoon-hyun
Copy link
Member

Please open a new one based on the master branch and ping us, @djspiewak .

@sunchao
Copy link
Member

sunchao commented May 22, 2025

Wow, what a surprise to see you here @djspiewak! We were in the same John Boyland’s type theory class 10+ years ago 😂

@djspiewak
Copy link

Small world! :D

@djspiewak
Copy link

Cross-posting: I've updated the other PR and retargeted on master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants