Skip to content

Conversation

liangan1
Copy link
Collaborator

@liangan1 liangan1 commented Aug 22, 2025

This PR is used to enable the Int4XPUTensorIntZP. The pacing format name is "int4_xpu_int_zp"
Testcase:
bash python test/quantization/quantize_/workflows/int4/test_int4_xpu.py

Copy link

pytorch-bot bot commented Aug 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2845

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit 7063e56 with merge base ba111b0 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 22, 2025
@liangan1 liangan1 added the topic: new feature Use this tag if this PR adds a new feature label Aug 25, 2025
@liangan1 liangan1 changed the title [WIP]Add Int4XPUTensorIntZP Add Int4XPUTensorIntZP Aug 25, 2025
Comment on lines +44 to +45
"int4_xpu_int_zp is referring to the format used by int4 weight-only quantization on XPU with int zero point, which is a groupwise quantization format."
INT4_XPU_INT_ZP = "int4_xpu_int_zp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't include int4 and xpu in the name, can you name this in terms of of how the quantized data is packed?

Copy link
Collaborator Author

@liangan1 liangan1 Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int4 weight xpu is a plain format tensor according to this doc, it just pack 2 int4 weight elements in a byte and then store the 4*int4 as int32. So I change it to the plain.

Comment on lines 33 to 34
qdata: packed int4 weigh, always viewed as a 2D (N, K/2) tensor, last dimension is packed
preshuffling is specific to CPU kernels, see Note below.
Copy link
Contributor

@jerryzh168 jerryzh168 Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct? can you update this to describe how xpu tensor is packed? examples can be found in

plain means the format that quantized Tensor data lays out elements in Tensor sequentially,

I guess the detailed doc can exist in packing_format.py, here maybe a high level summary of how the packing format looks like is good enough

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature Use this tag if this PR adds a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants