-
Notifications
You must be signed in to change notification settings - Fork 321
Add Int4XPUTensorIntZP #2845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Int4XPUTensorIntZP #2845
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2845
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New FailuresAs of commit 7063e56 with merge base ba111b0 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
"int4_xpu_int_zp is referring to the format used by int4 weight-only quantization on XPU with int zero point, which is a groupwise quantization format." | ||
INT4_XPU_INT_ZP = "int4_xpu_int_zp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't include int4 and xpu in the name, can you name this in terms of of how the quantized data is packed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The int4 weight xpu is a plain format tensor according to this doc, it just pack 2 int4 weight elements in a byte and then store the 4*int4 as int32. So I change it to the plain.
qdata: packed int4 weigh, always viewed as a 2D (N, K/2) tensor, last dimension is packed | ||
preshuffling is specific to CPU kernels, see Note below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct? can you update this to describe how xpu tensor is packed? examples can be found in
plain means the format that quantized Tensor data lays out elements in Tensor sequentially, |
I guess the detailed doc can exist in packing_format.py, here maybe a high level summary of how the packing format looks like is good enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
This PR is used to enable the Int4XPUTensorIntZP. The pacing format name is "int4_xpu_int_zp"
Testcase:
bash python test/quantization/quantize_/workflows/int4/test_int4_xpu.py