Onnx slim transform #536

tchawada · 2025-08-12T04:52:25Z

Performance comparison with onnx-slim transformed model and original model for GPT2 model.

========================= Performance Stats with Onnx-Slim=========================
Average Prefill time a.k.a TTFT is= 0.01 sec        
Decode is= 408.75 tokens/sec        
Total is= 399.23 tokens/sec        
Total (E2E) inference time is= 0.31 sec
onnx file size=420K
qpc file size=391M
GPT2LMHeadModel_0.onnx.data=622.94M
time taken to slim 19 seconds
compile time=39.90 seconds

========================= Performance Stats without Onnx-Slim =========================
Average Prefill time a.k.a TTFT is= 0.01 sec        
Decode is= 384.3 tokens/sec        
Total is= 374.99 tokens/sec        
Total (E2E) inference time is= 0.33 sec
onnx file size=516K
qpc file size= 391M
GPT2LMHeadModel_0.onnx.data=622.94M
compile time =36.44 seconds

Signed-off-by: Tanisha <[email protected]>

tchawada · 2025-08-12T05:06:25Z

Performance comparison with onnx-slim transformed model and original model for GPT2 model.
========================= Performance Stats with Onnx-Slim=========================
Average Prefill time a.k.a TTFT is= 0.01 sec
Decode is= 408.75 tokens/sec
Total is= 399.23 tokens/sec
Total (E2E) inference time is= 0.31 sec
onnx file size=420K
qpc file size=391M
GPT2LMHeadModel_0.onnx.data=622.94M
time taken to slim 19 seconds
compile time=39.90 seconds
========================= Performance Stats without Onnx-Slim =========================
Average Prefill time a.k.a TTFT is= 0.01 sec
Decode is= 384.3 tokens/sec
Total is= 374.99 tokens/sec
Total (E2E) inference time is= 0.33 sec
onnx file size=516K
qpc file size= 391M
GPT2LMHeadModel_0.onnx.data=622.94M
compile time =36.44 seconds

Signed-off-by: Tanisha <[email protected]>

quic-amitraj · 2025-08-12T07:53:26Z

Please once test with the full model https://huggingface.co/meta-llama/Llama-2-7b-chat-hf.

quic-amitraj · 2025-08-12T09:46:57Z

Apply ruff check and format @tchawada

tchawada · 2025-08-12T09:48:54Z

I have applied ruff check and format. Regards Tanisha

…

________________________________ From: Amit Raj ***@***.***> Sent: Tuesday, August 12, 2025 3:17 PM To: quic/efficient-transformers ***@***.***> Cc: Tanisha Chawada ***@***.***>; Mention ***@***.***> Subject: Re: [quic/efficient-transformers] Onnx slim transform (PR #536) WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros. [https://avatars.githubusercontent.com/u/168538872?s=20&v=4]quic-amitraj left a comment (quic/efficient-transformers#536)<#536 (comment)> Apply ruff check and format @tchawada<https://github.com/tchawada> — Reply to this email directly, view it on GitHub<#536 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BVI2PQ66QHJDI2VQQ2HXNUL3NGZ4LAVCNFSM6AAAAACDVIPW6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCNZYGYYDKNBRGE>. You are receiving this because you were mentioned.Message ID: ***@***.***>

quic-hemagnih

Please resolve the lint warnings

quic-hemagnih · 2025-08-13T06:13:12Z

QEfficient/base/onnx_transforms.py

@@ -37,6 +39,8 @@ class FP16ClipTransform(OnnxTransform):
    Clips the tensor values to be in FP16 range, but preserves -inf values.
    """

+    print("FP16ClipTransform is applied")


I would suggest to use logger to print any messages.

quic-hemagnih · 2025-08-13T06:14:53Z

QEfficient/base/onnx_transforms.py

+        onnx_slim_transform = kwargs.get("enable_onnx_slim_transform", False)
+        temp_onnx_path = kwargs.get("temp_onnx_path", None)
+        if onnx_slim_transform:
+            print("onnx slim transform done")


remove print

quic-hemagnih · 2025-08-13T06:20:48Z

QEfficient/base/onnx_transforms.py

+            print("onnx slim transform done")
+            transformed = True
+            slimmed_model = onnxslim.slim(model)
+            onnx.save(slimmed_model, temp_onnx_path)


Add Type Checking or Validation Ensure temp_onnx_path is not None before saving

tchawada · 2025-08-13T06:31:34Z

I will do the appropriate changes.

________________________________ From: Hem Agnihotri ***@***.***> Sent: Wednesday, August 13, 2025 11:51 AM To: quic/efficient-transformers ***@***.***> Cc: Tanisha Chawada ***@***.***>; Mention ***@***.***> Subject: Re: [quic/efficient-transformers] Onnx slim transform (PR #536) WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros. @quic-hemagnih commented on this pull request.

________________________________ In QEfficient/base/onnx_transforms.py<#536 (comment)>:

+ *,

+ onnx_base_dir: Optional[str] = None, + **kwargs, + ) -> Tuple[ModelProto, bool]: + """ + :param enable_onnx_slim_transform: If True, applies onnx-slim transformations. + """ + # print(kwargs) + transformed = False + onnx_slim_transform = kwargs.get("enable_onnx_slim_transform", False) + temp_onnx_path = kwargs.get("temp_onnx_path", None) + if onnx_slim_transform: + print("onnx slim transform done") + transformed = True + slimmed_model = onnxslim.slim(model) + onnx.save(slimmed_model, temp_onnx_path) Add Type Checking or Validation Ensure temp_onnx_path is not None before saving — Reply to this email directly, view it on GitHub<#536 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BVI2PQYNFOARIKTG4PFZAND3NLKNNAVCNFSM6AAAAACDVIPW6CVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTCMJUGA4TEMBTGE>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Signed-off-by: Tanisha <[email protected]>

quic-rishinr · 2025-08-20T10:21:08Z

Instead of adding onnx_slim_transform to every AutoModel class can we consider creating a transform configuration module that returns enabled/disabled transforms as dict. Apply transforms in the base class based on this config? this can be applicable for both pytorch and onnx transforms.

inisis · 2025-08-21T02:27:47Z

Hi, I'm the author of onnxslim, thanks for using it, and onnxslim applies to evevy single onnx model, feel free to message me if you have any problem and looking forward to more cooperation and intergration for your porjects.

tchawada added 3 commits August 11, 2025 05:49

Initial commit for adding onnx-slim transform

5f53b46

Signed-off-by: Tanisha <[email protected]>

Added onnxslim library to pyproject

bcf4ee4

Signed-off-by: Tanisha <[email protected]>

Initial code of Onnx-Slim Transform

5818a94

Signed-off-by: Tanisha <[email protected]>

tchawada requested review from quic-rishinr, ochougul, quic-hemagnih and quic-amitraj as code owners August 12, 2025 04:52

Initial code of Onnx-Slim Transform

03452e4

Signed-off-by: Tanisha <[email protected]>

tchawada closed this Aug 12, 2025

tchawada reopened this Aug 12, 2025

Initial code of Onnx-Slim Transform

04d3ca7

Signed-off-by: Tanisha <[email protected]>

quic-hemagnih reviewed Aug 13, 2025

View reviewed changes

Added check for temp_onnx_path

eef0c8e

Signed-off-by: Tanisha <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Onnx slim transform #536

Onnx slim transform #536

tchawada commented Aug 12, 2025 •

edited by vbaddi

Loading

Uh oh!

tchawada commented Aug 12, 2025

Uh oh!

quic-amitraj commented Aug 12, 2025

Uh oh!

quic-amitraj commented Aug 12, 2025 •

edited

Loading

Uh oh!

tchawada commented Aug 12, 2025 via email

Uh oh!

quic-hemagnih left a comment

Uh oh!

quic-hemagnih Aug 13, 2025

Uh oh!

quic-hemagnih Aug 13, 2025

Uh oh!

quic-hemagnih Aug 13, 2025

Uh oh!

tchawada commented Aug 13, 2025 via email

Uh oh!

quic-rishinr commented Aug 20, 2025

Uh oh!

inisis commented Aug 21, 2025

Uh oh!

Uh oh!

Onnx slim transform #536

Are you sure you want to change the base?

Onnx slim transform #536

Conversation

tchawada commented Aug 12, 2025 • edited by vbaddi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tchawada commented Aug 12, 2025

Uh oh!

quic-amitraj commented Aug 12, 2025

Uh oh!

quic-amitraj commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tchawada commented Aug 12, 2025 via email

Uh oh!

quic-hemagnih left a comment

Choose a reason for hiding this comment

Uh oh!

quic-hemagnih Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

quic-hemagnih Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

quic-hemagnih Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

tchawada commented Aug 13, 2025 via email

Uh oh!

quic-rishinr commented Aug 20, 2025

Uh oh!

inisis commented Aug 21, 2025

Uh oh!

Uh oh!

tchawada commented Aug 12, 2025 •

edited by vbaddi

Loading

quic-amitraj commented Aug 12, 2025 •

edited

Loading