-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fp8 quantization for inference. #1316
base: main
Are you sure you want to change the base?
Conversation
Important The terms of service for this installation has not been accepted. Please ask the Organization owners to visit the Gemini Code Assist Admin Console to sign it. |
@singh-mitali, can you take a look at these quantization changes? |
a0ec1b6
to
061b9da
Compare
061b9da
to
a681ba3
Compare
@@ -496,6 +521,8 @@ def einsum_fn_with_rhs_qtensor( | |||
def einsum_fn_with_rhs_qtensor_and_dequant(self, value): | |||
return self.einsum_fn_with_rhs_qtensor( | |||
value, | |||
lhs_dequant_mode=aqt_config.DequantMode.THIS_INPUT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this would make it perform better with fp8. int8 performance stays the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a separate function or pass a flag.
Description
Start with a short description of what the PR does and how this is a change from
the past.
The rest of the description includes relevant details and context, examples:
If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456
Notice 1 Once all tests pass, the "pull ready" label will automatically be assigned. This label is used
for administrative purposes. Please do not add it manually.
Notice 2 For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.
Tests
Please describe how you tested this change, and include any instructions and/or
commands to reproduce.
Checklist
Before submitting this PR, please make sure (put X in square brackets):