We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanks for the great work!
Now I have my own sparsified and GPTQ-quantized model, I'd like to run it in deepsparse to see some inference speedup or other advantages. To export it to ONNX, I tried running https://github.com/neuralmagic/sparseml/tree/main/src/sparseml/transformers/sparsification/obcq#-how-to-export-the-one-shot-model but it seems it doesn't work for GPTQ-quantized model. How do i export a GPTQ model (e.g., TheBloke/Llama-2-7B-Chat-GPTQ) to ONNX model so that it can work in DeepSparse? Thanks.
The text was updated successfully, but these errors were encountered:
Hey @Tangxinlu, the sparseml.export is the appropriate pathway. Could you share your code and stack trace, so that I can reproduce the issue?
sparseml.export
Sorry, something went wrong.
Hi @dbogunowicz, thanks for the quick reply!
Here is an example:
git clone https://github.com/neuralmagic/sparseml pip install -e "sparseml[transformers]" huggingface-cli download TechxGenus/Meta-Llama-3-8B-GPTQ --local-dir Meta-Llama-3-8B-GPTQ # Add `"disable_exllama": true` to `"quantization_config"` in `Meta-Llama-3-8B-GPTQ/config.json` sparseml.export --task text-generation ./Meta-Llama-3-8B-GPTQ
Error:
... sparseml/src/sparseml/pytorch/torch_to_onnx_exporter.py", line 100, in pre_validate return deepcopy(module).to("cpu").eval() ... TypeError: cannot pickle 'module' object
envs:
Ohhh! What is the solution to this problem? @dbogunowicz
Hi, Have you solved the problem? @Tangxinlu
No branches or pull requests
Thanks for the great work!
Now I have my own sparsified and GPTQ-quantized model, I'd like to run it in deepsparse to see some inference speedup or other advantages. To export it to ONNX, I tried running https://github.com/neuralmagic/sparseml/tree/main/src/sparseml/transformers/sparsification/obcq#-how-to-export-the-one-shot-model but it seems it doesn't work for GPTQ-quantized model. How do i export a GPTQ model (e.g., TheBloke/Llama-2-7B-Chat-GPTQ) to ONNX model so that it can work in DeepSparse? Thanks.
The text was updated successfully, but these errors were encountered: