How to export a GPTQ model to ONNX to run in DeepSparse #2293

Tangxinlu · 2024-05-20T08:03:45Z

Thanks for the great work!

Now I have my own sparsified and GPTQ-quantized model, I'd like to run it in deepsparse to see some inference speedup or other advantages. To export it to ONNX, I tried running https://github.com/neuralmagic/sparseml/tree/main/src/sparseml/transformers/sparsification/obcq#-how-to-export-the-one-shot-model but it seems it doesn't work for GPTQ-quantized model. How do i export a GPTQ model (e.g., TheBloke/Llama-2-7B-Chat-GPTQ) to ONNX model so that it can work in DeepSparse? Thanks.

dbogunowicz · 2024-05-20T08:06:23Z

Hey @Tangxinlu, the sparseml.export is the appropriate pathway. Could you share your code and stack trace, so that I can reproduce the issue?

Tangxinlu · 2024-05-20T08:30:07Z

Hi @dbogunowicz, thanks for the quick reply!

Here is an example:

git clone https://github.com/neuralmagic/sparseml
pip install -e "sparseml[transformers]"
huggingface-cli download TechxGenus/Meta-Llama-3-8B-GPTQ --local-dir Meta-Llama-3-8B-GPTQ
# Add `"disable_exllama": true` to `"quantization_config"` in `Meta-Llama-3-8B-GPTQ/config.json`

sparseml.export --task text-generation ./Meta-Llama-3-8B-GPTQ

Error:

...
sparseml/src/sparseml/pytorch/torch_to_onnx_exporter.py", line 100, in pre_validate
    return deepcopy(module).to("cpu").eval()
...
TypeError: cannot pickle 'module' object

envs:

torch 2.1.2
transformers 4.39.1
onnx 1.14.1
onnxruntime 1.17.3
sparseml-nightly 1.8.0.20240520

skylake5200 · 2025-01-03T08:40:17Z

Ohhh! What is the solution to this problem? @dbogunowicz

skylake5200 · 2025-01-07T02:49:04Z

Hi, Have you solved the problem? @Tangxinlu

Tangxinlu added the enhancement New feature or request label May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to export a GPTQ model to ONNX to run in DeepSparse #2293

How to export a GPTQ model to ONNX to run in DeepSparse #2293

Tangxinlu commented May 20, 2024

dbogunowicz commented May 20, 2024

Tangxinlu commented May 20, 2024

skylake5200 commented Jan 3, 2025 •

edited

Loading

skylake5200 commented Jan 7, 2025

How to export a GPTQ model to ONNX to run in DeepSparse #2293

How to export a GPTQ model to ONNX to run in DeepSparse #2293

Comments

Tangxinlu commented May 20, 2024

dbogunowicz commented May 20, 2024

Tangxinlu commented May 20, 2024

skylake5200 commented Jan 3, 2025 • edited Loading

skylake5200 commented Jan 7, 2025

skylake5200 commented Jan 3, 2025 •

edited

Loading