Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance DataWriter to save memory during kudo serialization. #2891

Merged
merged 7 commits into from
Feb 11, 2025

Conversation

liurenjie1024
Copy link
Collaborator

Close #2890

  1. Add reserve method in DataWriter so that kudo data writer could reserve memory before actual writing happens. This helps avoiding unnecessay allocation and copy.
  2. Add OpenByteArrayOutputStream and its corresponding data writer. OpenByteArrayOutputStream could be used in customized shuffle manager to save memory.
  3. Add ByteArrayOutputStreamWriter which helps saving memory copy without introducing any changes to shuffle manager.

@liurenjie1024
Copy link
Collaborator Author

build

Copy link
Collaborator

@binmahone binmahone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look fine. I think that there are probably simpler ways to do this, but with the current code where we need to maintain compatibility with the JCUDF Serialization for the time being I think this is okay. It would be nice to have a follow on issue to come back and clean things up when we drop support for jcudf serialization.

@liurenjie1024 liurenjie1024 merged commit fd8b069 into NVIDIA:branch-25.04 Feb 11, 2025
5 checks passed
@liurenjie1024 liurenjie1024 deleted the ray/issue-2890 branch February 11, 2025 01:55
@liurenjie1024
Copy link
Collaborator Author

It would be nice to have a follow on issue to come back and clean things up when we drop support for jcudf serialization.

Sorry, could you be more specific on this? I'm not sure which part should be clean up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants