Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround thrust-copy-if limit in wordpiece-tokenizer #12168

Merged
merged 14 commits into from
Nov 28, 2022

Conversation

davidwendt
Copy link
Contributor

Description

Workaround in nvtext's wordpiece-tokenizer due to limitation in thrust::copy_if which fails if the input-iterator spans more than int-max.

Found existing thrust issue: NVIDIA/cccl#747
This calls the thrust::copy_if in chunks if the iterator can span greater than int-max.

Found while working on #12079

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added bug Something isn't working 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Nov 16, 2022
@davidwendt davidwendt self-assigned this Nov 16, 2022
@codecov
Copy link

codecov bot commented Nov 17, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.02@7426a06). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-23.02   #12168   +/-   ##
===============================================
  Coverage                ?   88.26%           
===============================================
  Files                   ?      137           
  Lines                   ?    22586           
  Branches                ?        0           
===============================================
  Hits                    ?    19935           
  Misses                  ?     2651           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@davidwendt
Copy link
Contributor Author

rerun tests

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Nov 18, 2022
@davidwendt davidwendt marked this pull request as ready for review November 18, 2022 14:05
@davidwendt davidwendt requested a review from a team as a code owner November 18, 2022 14:05
@davidwendt davidwendt requested review from vyasr and ttnghia and removed request for a team November 18, 2022 14:05
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A couple of minor non-blocking suggestions.

cpp/src/text/subword/wordpiece_tokenizer.cu Outdated Show resolved Hide resolved
cpp/src/text/subword/wordpiece_tokenizer.cu Outdated Show resolved Hide resolved
@davidwendt
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 82b646e into rapidsai:branch-23.02 Nov 28, 2022
@davidwendt davidwendt deleted the fix-wpt-thrust-copy-if branch November 28, 2022 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants