-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround thrust-copy-if limit in wordpiece-tokenizer #12168
Workaround thrust-copy-if limit in wordpiece-tokenizer #12168
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #12168 +/- ##
===============================================
Coverage ? 88.26%
===============================================
Files ? 137
Lines ? 22586
Branches ? 0
===============================================
Hits ? 19935
Misses ? 2651
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A couple of minor non-blocking suggestions.
@gpucibot merge |
Description
Workaround in nvtext's wordpiece-tokenizer due to limitation in
thrust::copy_if
which fails if the input-iterator spans more than int-max.Found existing thrust issue: NVIDIA/cccl#747
This calls the
thrust::copy_if
in chunks if the iterator can span greater than int-max.Found while working on #12079
Checklist