INC PyTorch 3.x API Design #1527
Unanswered
xin3he
asked this question in
Show and tell
Replies: 2 comments
-
Beta Was this translation helpful? Give feedback.
0 replies
-
Load APIAPI designfrom @xin3he: def load(model_name_or_path, original_model=None, format='default', device='cpu', **kwargs):
"""
Paramters:
model_name_or_path - if 'format' is set to 'huggingface', it means the huggingface del_name_or_path.
if 'format' is set to 'default', it means the 'checkpoint_dir'
parameter should not be None. it coworks with 'original_model' parameter
to load INC quantized INT8/FP8 model in local.
original_model - optional, only needed if 'format' is set to 'default'.
It co-works with 'model_name_or_path' paramter load INC quantized INT8/FP8 model in local..
For TorchScript model, original_model is not required.
format - 'default' or 'huggingface'. support huggingface model or INC quantized model.
device - 'cpu', 'hpu' or 'gpu'. specify the device the model will be loaded to.
kwargs - For `huggingface.from_pretrained` API required parameters,
such as 'trust_remote_code', 'revision'.
""" Benefits of this deisgn:
Usage demo
Related PR |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
INC PyTorch 3.x API Design
Target
Main principles
quantize
andautotune
are user interface APIs for quantization. One is a one-time quantization, the other requires a set of configurations.GPTQConfig
and autotune will use a set of configurations.Repo Architecture
BF16
/FP16
/FP8
quantize
,autotune
are imported here.algorithms
folder.algorithms
folder.fp8
/ipex
/weight_only
folder.WeightOnlyLinear
.fetch_modules
.GGML_TYPE_Q4_K
.Previous Design
IPEX StaticQuant & SmoothQuant
PyTorch Weight-only Quantization
New Design
IPEX StaticQuant & SmoothQuant
Configuration
The argument to config is data or a list of data. If the parameters can be assembled into different configurations, the returned obj will be a list of configurations used for autotuning.
Quantize Interface
PyTorch Weight-only Quantization
Configuration
Quantize Interface
Beta Was this translation helpful? Give feedback.
All reactions