INC ONNX Runtime 3.x API design #1532
Unanswered
mengniwang95
asked this question in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
INC ONNX Runtime 3.x API Design
Target
Main principles
autotune
is the exposed user interface API, which requires a set of configurations._quantize
is an internal API.GPTQConfig
and autotune will use a set of configurations.Repo Architecture
autotune
are imported here.calculate_scale_zp
.Previous Design
StaticQuant & SmoothQuant
Weight-only Quantization
New Design
StaticQuant & SmoothQuant
Configuration
The argument to config is data or a list of data. If the parameters can be assembled into different configurations, the returned obj will be a list of configurations used for autotuning.
Quantize Interface
Weight-only Quantization
Configuration
Quantize Interface
Beta Was this translation helpful? Give feedback.
All reactions