You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently evaluating the performance efficiency of a Hugging Face model by comparing two approaches: using the model directly through the Hugging Face model class versus disassembling and reassembling its 32 layers sequentially with the passthrough method from MergeKit.
Configuration Details
Below is the YML configuration file used for the experiment:
The metric used for evaluation is generation time per token, as detailed below:
Input of 575 Tokens Input:
Direct Model Usage : 3.4767779807548025 seconds per token
MergeKit Passthrough Model : 4.2156252472011655 seconds per token
Input of 311 Tokens Input:
Direct Model Usage : 3.32432980222387 seconds per token
MergeKit Passthrough Model : 4.17318613631828 seconds per token
Input of 107 Tokens Input:
Direct Model Usage : 2.503785534783288 seconds per token
MergeKit Passthrough Model : 4.000283993042268 seconds per token
Why this happens and how can I fix it?
I notices this when I tried to remove 1 layer from the model and test its performance, and unexpectedly the time per token increased instead of decreasing
The text was updated successfully, but these errors were encountered:
I am currently evaluating the performance efficiency of a Hugging Face model by comparing two approaches: using the model directly through the Hugging Face model class versus disassembling and reassembling its 32 layers sequentially with the passthrough method from MergeKit.
Configuration Details
Below is the YML configuration file used for the experiment:
Performance Metrics
The metric used for evaluation is generation time per token, as detailed below:
Input of 575 Tokens Input:
Input of 311 Tokens Input:
Input of 107 Tokens Input:
Why this happens and how can I fix it?
I notices this when I tried to remove 1 layer from the model and test its performance, and unexpectedly the time per token increased instead of decreasing
The text was updated successfully, but these errors were encountered: