Skip to content

Commit 282850d

Browse files
authored
fix docstring format issue (#3515)
1 parent cc3c909 commit 282850d

File tree

1 file changed

+41
-45
lines changed

1 file changed

+41
-45
lines changed

cpu/2.6.0+cpu/tutorials/api_doc.html

+41-45
Original file line numberDiff line numberDiff line change
@@ -1413,65 +1413,61 @@ <h2>Graph Optimization<a class="headerlink" href="#graph-optimization" title="Li
14131413
</dd></dl>
14141414

14151415
<dl class="py function">
1416-
<dt class="sig sig-object py" id="ipex.quantization.get_weight_only_quant_qconfig_mapping">
1417-
<span class="sig-prename descclassname"><span class="pre">ipex.quantization.</span></span><span class="sig-name descname"><span class="pre">get_weight_only_quant_qconfig_mapping</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight_dtype</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">WoqWeightDtype.INT8</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lowp_mode</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">WoqLowpMode.NONE</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">act_quant_mode</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">WoqActQuantMode.PER_BATCH_IC_BLOCK_SYM</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">group_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">-1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight_qscheme</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">WoqWeightQScheme.UNDEFINED</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.quantization.get_weight_only_quant_qconfig_mapping" title="Link to this definition"></a></dt>
1418-
<dd><p>Configuration for weight-only quantization (WOQ) for LLM.
1419-
:param weight_dtype: Data type for weight, WoqWeightDtype.INT8/INT4/NF4, etc.
1420-
:param lowp_mode: specify the lowest precision data type for computation. Data types</p>
1421-
<blockquote>
1422-
<div><p>that has even lower precision won’t be used.
1423-
Not necessarily related to activation or weight dtype.
1424-
- NONE(0): Use the activation data type for computation.
1425-
- FP16(1): Use float16 (a.k.a. half) as the lowest precision for computation.
1426-
- BF16(2): Use bfloat16 as the lowest precision for computation.
1427-
- INT8(3): Use INT8 as the lowest precision for computation.</p>
1428-
<blockquote>
1429-
<div><p>Activation is quantized to int8 at runtime in this case.</p>
1430-
</div></blockquote>
1431-
</div></blockquote>
1416+
<dt class="sig sig-object py" id="intel_extension_for_pytorch.quantization.get_weight_only_quant_qconfig_mapping">
1417+
<span class="sig-prename descclassname"><span class="pre">intel_extension_for_pytorch.quantization.</span></span><span class="sig-name descname"><span class="pre">get_weight_only_quant_qconfig_mapping</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight_dtype</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">WoqWeightDtype.INT8</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lowp_mode</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">WoqLowpMode.NONE</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">act_quant_mode</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">WoqActQuantMode.PER_BATCH_IC_BLOCK_SYM</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">group_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">-1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight_qscheme</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">WoqWeightQScheme.UNDEFINED</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#intel_extension_for_pytorch.quantization.get_weight_only_quant_qconfig_mapping" title="Link to this definition"></a></dt>
1418+
<dd><p>Configuration for weight-only quantization (WOQ) for LLM.</p>
14321419
<dl class="field-list simple">
14331420
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
14341421
<dd class="field-odd"><ul class="simple">
1435-
<li><p><strong>act_quant_mode</strong> – Quantization granularity of activation. It only works for lowp_mode=INT8.
1422+
<li><p><strong>weight_dtype</strong> – Data type for weight, WoqWeightDtype.INT8/INT4/NF4, etc.</p></li>
1423+
<li><p><strong>lowp_mode</strong><p>specify the lowest precision data type for computation. Data types
1424+
that has even lower precision won’t be used.
1425+
Not necessarily related to activation or weight dtype.</p>
1426+
<ul>
1427+
<li><p>NONE(0): Use the activation data type for computation.</p></li>
1428+
<li><p>FP16(1): Use float16 (a.k.a. half) as the lowest precision for computation.</p></li>
1429+
<li><p>BF16(2): Use bfloat16 as the lowest precision for computation.</p></li>
1430+
<li><p>INT8(3): Use INT8 as the lowest precision for computation.
1431+
Activation is quantized to int8 at runtime in this case.</p></li>
1432+
</ul>
1433+
</p></li>
1434+
<li><p><strong>act_quant_mode</strong><p>Quantization granularity of activation. It only works for lowp_mode=INT8.
14361435
It has no effect in other cases. The tensor is divided into groups, and
14371436
each group is quantized with its own quantization parameters.
1438-
Suppose the activation has shape batch_size by input_channel (IC).
1439-
- PER_TENSOR(0): Use the same quantization parameters for the entire tensor.
1440-
- PER_IC_BLOCK(1): Tensor is divided along IC with group size = IC_BLOCK.
1441-
- PER_BATCH(2): Tensor is divided along batch_size with group size = 1.
1442-
- PER_BATCH_IC_BLOCK(3): Tenosr is divided into blocks of 1 x IC_BLOCK.
1443-
Note that IC_BLOCK is determined by group_size automatically.</p></li>
1437+
Suppose the activation has shape batch_size by input_channel (IC).</p>
1438+
<ul>
1439+
<li><p>PER_TENSOR(0): Use the same quantization parameters for the entire tensor.</p></li>
1440+
<li><p>PER_IC_BLOCK(1): Tensor is divided along IC with group size = IC_BLOCK.</p></li>
1441+
<li><p>PER_BATCH(2): Tensor is divided along batch_size with group size = 1.</p></li>
1442+
<li><p>PER_BATCH_IC_BLOCK(3): Tenosr is divided into blocks of 1 x IC_BLOCK.</p></li>
1443+
</ul>
1444+
<p>Note that IC_BLOCK is determined by group_size automatically.</p>
1445+
</p></li>
14441446
<li><p><strong>group_size</strong><p>Control quantization granularity along input channel (IC) dimension of weight.
1445-
Must be a positive power of 2 (i.e., 2^k, k &gt; 0) or -1.
1446-
If group_size = -1:</p>
1447-
<blockquote>
1448-
<div><dl class="simple">
1449-
<dt>If act_quant_mode = PER_TENSOR ro PER_BATCH:</dt><dd><p>No grouping along IC for both activation and weight</p>
1450-
</dd>
1451-
<dt>If act_quant_mode = PER_IC_BLOCK or PER_BATCH_IC_BLOCK:</dt><dd><p>No grouping along IC for weight. For activation,
1452-
IC_BLOCK is determined automatically by IC.</p>
1453-
</dd>
1454-
</dl>
1455-
</div></blockquote>
1456-
<dl class="simple">
1457-
<dt>If group_size &gt; 0:</dt><dd><p>act_quant_mode can be any. If act_quant_mode is PER_IC_BLOCK(_SYM)
1458-
or PER_BATCH_IC_BLOCK(_SYM), weight is grouped along IC by group_size.
1459-
The IC_BLOCK for activation is determined by group_size automatically.
1460-
Each group has its own quantization parameters.</p>
1461-
</dd>
1462-
</dl>
1447+
Must be a positive power of 2 (i.e., 2^k, k &gt; 0) or -1. The rule is</p>
1448+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">If</span> <span class="n">group_size</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
1449+
<span class="n">If</span> <span class="n">act_quant_mode</span> <span class="o">=</span> <span class="n">PER_TENSOR</span> <span class="n">ro</span> <span class="n">PER_BATCH</span><span class="p">:</span>
1450+
<span class="n">No</span> <span class="n">grouping</span> <span class="n">along</span> <span class="n">IC</span> <span class="k">for</span> <span class="n">both</span> <span class="n">activation</span> <span class="ow">and</span> <span class="n">weight</span>
1451+
<span class="n">If</span> <span class="n">act_quant_mode</span> <span class="o">=</span> <span class="n">PER_IC_BLOCK</span> <span class="ow">or</span> <span class="n">PER_BATCH_IC_BLOCK</span><span class="p">:</span>
1452+
<span class="n">No</span> <span class="n">grouping</span> <span class="n">along</span> <span class="n">IC</span> <span class="k">for</span> <span class="n">weight</span><span class="o">.</span> <span class="n">For</span> <span class="n">activation</span><span class="p">,</span>
1453+
<span class="n">IC_BLOCK</span> <span class="ow">is</span> <span class="n">determined</span> <span class="n">automatically</span> <span class="n">by</span> <span class="n">IC</span><span class="o">.</span>
1454+
<span class="n">If</span> <span class="n">group_size</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
1455+
<span class="n">act_quant_mode</span> <span class="n">can</span> <span class="n">be</span> <span class="nb">any</span><span class="o">.</span> <span class="n">If</span> <span class="n">act_quant_mode</span> <span class="ow">is</span> <span class="n">PER_IC_BLOCK</span><span class="p">(</span><span class="n">_SYM</span><span class="p">)</span>
1456+
<span class="ow">or</span> <span class="n">PER_BATCH_IC_BLOCK</span><span class="p">(</span><span class="n">_SYM</span><span class="p">),</span> <span class="n">weight</span> <span class="ow">is</span> <span class="n">grouped</span> <span class="n">along</span> <span class="n">IC</span> <span class="n">by</span> <span class="n">group_size</span><span class="o">.</span>
1457+
<span class="n">The</span> <span class="n">IC_BLOCK</span> <span class="k">for</span> <span class="n">activation</span> <span class="ow">is</span> <span class="n">determined</span> <span class="n">by</span> <span class="n">group_size</span> <span class="n">automatically</span><span class="o">.</span>
1458+
<span class="n">Each</span> <span class="n">group</span> <span class="n">has</span> <span class="n">its</span> <span class="n">own</span> <span class="n">quantization</span> <span class="n">parameters</span><span class="o">.</span>
1459+
</pre></div>
1460+
</div>
14631461
</p></li>
14641462
<li><p><strong>weight_qscheme</strong><p>Specify how to quantize weight, asymmetrically or symmetrically. Generally,
14651463
asymmetric quantization has better accuracy than symmetric quantization at
14661464
the cost of performance. Symmetric quantization is faster but may have worse
14671465
accuracy. Default is undefined and determined by weight dtype: asymmetric in
14681466
most cases and symmetric if</p>
1469-
<blockquote>
1470-
<div><ol class="arabic simple">
1467+
<ol class="arabic simple">
14711468
<li><p>weight_dtype is NF4, or</p></li>
14721469
<li><p>weight_dtype is INT8 and lowp_mode is INT8.</p></li>
14731470
</ol>
1474-
</div></blockquote>
14751471
<p>One must use WoqWeightQScheme.SYMMETRIC in the above two cases.</p>
14761472
</p></li>
14771473
</ul>
@@ -1781,4 +1777,4 @@ <h2>Graph Optimization<a class="headerlink" href="#graph-optimization" title="Li
17811777
</script>
17821778

17831779
</body>
1784-
</html>
1780+
</html>

0 commit comments

Comments
 (0)