Remove _attn_implementation
in LlamaBidirectionalModel
constructor
#12364
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Important
The
Update branch
button must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Remove setting of
_attn_implementation
in theLlamaBidirectionalModel
constructor.The
eager
implementation is not as memory efficient as thesdpa
implementation (which is selected by default) and encounters OOM errors when calibrating the model during quantization with long sequences (~3000 tokens) on a GPU with 48GB memory.Removing this line enable the
sdpa
attention implementation to be used by default.The choice of attention implementation can be changed by providing the
attn_implementation
parameter to the model when loaded with thefrom_pretrained
method.Collection: llm
Changelog
_attn_implementation
in the LlamaBidirectionalModel constructor.GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
Did you write any new necessary tests?Did you add or update any necessary documentation?Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)Reviewer: Does the PR have correct import guards for all optional libraries?PR Type:
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.