forked from EricLBuehler/mistral.rs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add ISQ topology feature (EricLBuehler#701)
* Add topology for isq * Support single layer * Add apis and connect to some public apis * Use topology in isq quantization * Works now * Add demo topography * Fixes * Sorting a bit * Add example * Some error checking * Add example and docs, add default * Typos * Update deps
- Loading branch information
1 parent
3d84a05
commit 754bb6a
Showing
62 changed files
with
874 additions
and
278 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# Model topology configuration | ||
|
||
To support per-layer mix of ISQ, Mistral.rs supports loading a model topology YAML file. This YAML file is formatted as follows: | ||
|
||
1) Top-level keys are either: | ||
- A range of layers (`start-end`) where `start < end`. `start` is inclusive and `end` is inclusive | ||
- A single layer number | ||
2) The topology for the range or layer: | ||
- A single key (`isq`) which mapps to a single value, which can be any [ISQ type](ISQ.md#isq-quantization-types) | ||
|
||
Note that: | ||
- The topology for the range is expanded to fill the range | ||
- If ranges overlap, the range with the higher end layer takes precedence and will overwrite | ||
- Any layers which are not covered will have no topology mapping. They will inherit any other ISQ (e.g. with `--isq`/`in_situ_quant`) set. | ||
- Unless the layer is not covered by the topology, the topology value will override any other ISQ (e.g. with `--isq`/`in_situ_quant`). | ||
|
||
|
||
```yml | ||
0-8: | ||
isq: Q3K | ||
8-16: | ||
isq: Q4K | ||
16-24: | ||
isq: Q6K | ||
# Skip 24-28 | ||
28-32: | ||
isq: Q8_0 | ||
``` | ||
Model topologies may be applied to the following model types: | ||
- `plain`/`Plain` | ||
- `xlora`/`XLora` | ||
- `lora`/`Lora` | ||
- `vision-plain`/`VisionPlain` | ||
|
||
## CLI example | ||
``` | ||
cargo run --features ... -- -i plain -m microsoft/Phi-3-mini-128k-instruct -a phi3 --topology topologies/isq.yml | ||
``` | ||
## HTTP server example | ||
``` | ||
cargo run --features ... -- --port 1234 plain -m microsoft/Phi-3-mini-128k-instruct -a phi3 --topology topologies/isq.yml | ||
``` | ||
## Rust example | ||
Example [here](../mistralrs/examples/topology/main.rs). | ||
## Python example | ||
Example [here](../examples/python/topology.py). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
from mistralrs import Runner, Which, ChatCompletionRequest, Architecture | ||
|
||
runner = Runner( | ||
which=Which.Plain( | ||
model_id="mistralai/Mistral-7B-Instruct-v0.1", | ||
arch=Architecture.Mistral, | ||
topology="topologies/isq.yml", | ||
), | ||
in_situ_quant="Q4K", | ||
) | ||
|
||
res = runner.send_chat_completion_request( | ||
ChatCompletionRequest( | ||
model="mistral", | ||
messages=[ | ||
{"role": "user", "content": "Tell me a story about the Rust type system."} | ||
], | ||
max_tokens=256, | ||
presence_penalty=1.0, | ||
top_p=0.1, | ||
temperature=0.1, | ||
) | ||
) | ||
print(res.choices[0].message.content) | ||
print(res.usage) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.