Add Int4XPUTensorIntZP #2845

liangan1 · 2025-08-22T01:56:30Z

This PR is used to enable the Int4XPUTensorIntZP. The pacing format name is "int4_xpu_int_zp"
Testcase:
bash python test/quantization/quantize_/workflows/int4/test_int4_xpu.py

pytorch-bot · 2025-08-22T01:56:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2845

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit 7063e56 with merge base ba111b0 ():

NEW FAILURES - The following jobs have failed:

Code Analysis with Ruff / build (3.9) (gh)
Process completed with exit code 1.
Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/quantization/quantize_/workflows/int4/test_int4_xpu.py::Int4XPUTensorIntZP::test_module_path_float16
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0, cuda, 12.6) / linux-job (gh)
test/quantization/quantize_/workflows/int4/test_int4_xpu.py::Int4XPUTensorIntZP::test_module_path_float16
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
test/quantization/quantize_/workflows/int4/test_int4_xpu.py::Int4XPUTensorIntZP::test_module_path_float16
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
test/quantization/quantize_/workflows/int4/test_int4_xpu.py::Int4XPUTensorIntZP::test_module_path_float16

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2025-08-26T00:11:03Z

torchao/quantization/quantize_/common/packing_format.py

+    "int4_xpu_int_zp is referring to the format used by int4 weight-only quantization on XPU with int zero point, which is a groupwise quantization format."
+    INT4_XPU_INT_ZP = "int4_xpu_int_zp"


please don't include int4 and xpu in the name, can you name this in terms of of how the quantized data is packed?

The int4 weight xpu is a plain format tensor according to this doc, it just pack 2 int4 weight elements in a byte and then store the 4*int4 as int32. So I change it to the plain.

jerryzh168 · 2025-08-26T00:11:41Z

torchao/quantization/quantize_/workflows/int4/int4_xpu_tensor.py

+        qdata: packed int4 weigh, always viewed as a 2D (N, K/2) tensor, last dimension is packed
+               preshuffling is specific to CPU kernels, see Note below.


is this correct? can you update this to describe how xpu tensor is packed? examples can be found in

ao/torchao/quantization/quantize_/common/packing_format.py

Line 18 in c93bc7d

plain means the format that quantized Tensor data lays out elements in Tensor sequentially,

I guess the detailed doc can exist in packing_format.py, here maybe a high level summary of how the packing format looks like is good enough

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 22, 2025

liangan1 added 3 commits August 22, 2025 09:55

Add Int4XPUTensorIntZP

ec3e065

Add int4_xpu_tensor

1dc5b2c

Update int4_xpu_tensor.py

e63b100

liangan1 requested review from jerryzh168 and andrewor14 August 25, 2025 07:55

liangan1 added the topic: new feature Use this tag if this PR adds a new feature label Aug 25, 2025

liangan1 changed the title ~~[WIP]Add Int4XPUTensorIntZP~~ Add Int4XPUTensorIntZP Aug 25, 2025

liangan1 added 4 commits August 25, 2025 15:55

Fix typo

5ef1ca2

Fix code format issue

a28dd89

fix bug

8a0f124

Fix code format

a0ff36f

jerryzh168 reviewed Aug 26, 2025

View reviewed changes

liangan1 added 5 commits August 26, 2025 09:46

Merge branch 'main' into liangan1/int4_xpu_int_zp

5e9c476

Update int4_xpu_tensor.py

2c4c2ce

change the pack format to plain

e48ea0b

fix typo

c4e5b9d

Update quant_api.py

7063e56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Int4XPUTensorIntZP #2845

Add Int4XPUTensorIntZP #2845

Uh oh!

liangan1 commented Aug 22, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading

Uh oh!

jerryzh168 Aug 26, 2025

Uh oh!

liangan1 Aug 26, 2025 •

edited

Loading

Uh oh!

jerryzh168 Aug 26, 2025 •

edited

Loading

Uh oh!

liangan1 Aug 26, 2025

Uh oh!

Uh oh!

		"int4_xpu_int_zp is referring to the format used by int4 weight-only quantization on XPU with int zero point, which is a groupwise quantization format."
		INT4_XPU_INT_ZP = "int4_xpu_int_zp"

		qdata: packed int4 weigh, always viewed as a 2D (N, K/2) tensor, last dimension is packed
		preshuffling is specific to CPU kernels, see Note below.

Add Int4XPUTensorIntZP #2845

Are you sure you want to change the base?

Add Int4XPUTensorIntZP #2845

Uh oh!

Conversation

liangan1 commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2845

❌ 5 New Failures

Uh oh!

jerryzh168 Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

liangan1 Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liangan1 Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liangan1 commented Aug 22, 2025 •

edited

Loading

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading

liangan1 Aug 26, 2025 •

edited

Loading

jerryzh168 Aug 26, 2025 •

edited

Loading