Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve kudo concat performance. #2915

Merged
merged 10 commits into from
Feb 15, 2025

Conversation

liurenjie1024
Copy link
Collaborator

Closes #2913
Closes #2579

This pr improves kudo concatting performace by following approaches:

  1. It simplifies SchemaVisitor interface to avlid unnecessary allocation.
  2. It replaced lists in KudoTableMerger and MergeInfoCalc with arrays to avoid unnecessary allocation.
  3. It improved algorithm of concating validity buffer by remove all possible branch prediction and method call in tight loop codes.

@liurenjie1024
Copy link
Collaborator Author

build

@liurenjie1024
Copy link
Collaborator Author

build

@liurenjie1024 liurenjie1024 marked this pull request as draft February 12, 2025 06:07
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main question is how much faster did this make shuffle? This is a performance improvement, but I don't see any benchmark results for how much better it is.

@liurenjie1024
Copy link
Collaborator Author

build

@liurenjie1024
Copy link
Collaborator Author

My main question is how much faster did this make shuffle? This is a performance improvement, but I don't see any benchmark results for how much better it is.

I did some local micro benchmark about concating, and it shows 20% performance improvement, mainly introduced by optimizing concat validity buffer. My local branch shows nds result about 2% perf improvement.

@liurenjie1024
Copy link
Collaborator Author

The code is in fact ready for review, I'm marking it as draft since I'm running integration test to verify it.

revans2
revans2 previously approved these changes Feb 13, 2025
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I would prefer better names than caseOne, caseTwo, and caseThree, but it is documented and probably good enoguh.

@liurenjie1024
Copy link
Collaborator Author

build

@liurenjie1024 liurenjie1024 marked this pull request as ready for review February 15, 2025 06:43
@liurenjie1024
Copy link
Collaborator Author

Run integeration tests offline and it passed, merge this now.

@liurenjie1024 liurenjie1024 merged commit 3d12a86 into NVIDIA:branch-25.04 Feb 15, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve kudo's concat performance by simplifying codes. [FEA] Optimize kudo when merging validity buffer.
2 participants