Skip to content

Commit d987c0e

Browse files
author
reidliu41
committed
[Misc] Use collapsible blocks for benchmark examples
Signed-off-by: reidliu41 <[email protected]>
1 parent 0f9e735 commit d987c0e

File tree

1 file changed

+60
-34
lines changed

1 file changed

+60
-34
lines changed

benchmarks/README.md

Lines changed: 60 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This README guides you through running benchmark tests with the extensive
44
datasets supported on vLLM. It’s a living document, updated as new features and datasets
55
become available.
66

7-
## Dataset Overview
7+
**Dataset Overview**
88

99
<table style="width:100%; border-collapse: collapse;">
1010
<thead>
@@ -82,7 +82,10 @@ become available.
8282
**Note**: HuggingFace dataset's `dataset-name` should be set to `hf`
8383

8484
---
85-
## Example - Online Benchmark
85+
<details>
86+
<summary><b>🚀 Example - Online Benchmark</b></summary>
87+
88+
<br/>
8689

8790
First start serving your model
8891

@@ -130,7 +133,8 @@ P99 ITL (ms): 8.39
130133
==================================================
131134
```
132135

133-
### Custom Dataset
136+
**Custom Dataset**
137+
134138
If the dataset you want to benchmark is not supported yet in vLLM, even then you can benchmark on it using `CustomDataset`. Your data needs to be in `.jsonl` format and needs to have "prompt" field per entry, e.g., data.jsonl
135139

136140
```
@@ -162,7 +166,7 @@ python3 benchmarks/benchmark_serving.py --port 9001 --save-result --save-detaile
162166

163167
You can skip applying chat template if your data already has it by using `--custom-skip-chat-template`.
164168

165-
### VisionArena Benchmark for Vision Language Models
169+
**VisionArena Benchmark for Vision Language Models**
166170

167171
```bash
168172
# need a model with vision capability here
@@ -180,7 +184,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
180184
--num-prompts 1000
181185
```
182186

183-
### InstructCoder Benchmark with Speculative Decoding
187+
**InstructCoder Benchmark with Speculative Decoding**
184188

185189
``` bash
186190
VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
@@ -197,7 +201,7 @@ python3 benchmarks/benchmark_serving.py \
197201
--num-prompts 2048
198202
```
199203

200-
### Other HuggingFaceDataset Examples
204+
**Other HuggingFaceDataset Examples**
201205

202206
```bash
203207
vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests
@@ -251,7 +255,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
251255
--num-prompts 80
252256
```
253257

254-
### Running With Sampling Parameters
258+
**Running With Sampling Parameters**
255259

256260
When using OpenAI-compatible backends such as `vllm`, optional sampling
257261
parameters can be specified. Example client command:
@@ -269,7 +273,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
269273
--num-prompts 10
270274
```
271275

272-
### Running With Ramp-Up Request Rate
276+
**Running With Ramp-Up Request Rate**
273277

274278
The benchmark tool also supports ramping up the request rate over the
275279
duration of the benchmark run. This can be useful for stress testing the
@@ -284,8 +288,12 @@ The following arguments can be used to control the ramp-up:
284288
- `--ramp-up-start-rps`: The request rate at the beginning of the benchmark.
285289
- `--ramp-up-end-rps`: The request rate at the end of the benchmark.
286290

287-
---
288-
## Example - Offline Throughput Benchmark
291+
</details>
292+
293+
<details>
294+
<summary><b>📈 Example - Offline Throughput Benchmark</b></summary>
295+
296+
<br/>
289297

290298
```bash
291299
python3 vllm/benchmarks/benchmark_throughput.py \
@@ -303,7 +311,7 @@ Total num prompt tokens: 5014
303311
Total num output tokens: 1500
304312
```
305313

306-
### VisionArena Benchmark for Vision Language Models
314+
**VisionArena Benchmark for Vision Language Models**
307315

308316
``` bash
309317
python3 vllm/benchmarks/benchmark_throughput.py \
@@ -323,7 +331,7 @@ Total num prompt tokens: 14527
323331
Total num output tokens: 1280
324332
```
325333

326-
### InstructCoder Benchmark with Speculative Decoding
334+
**InstructCoder Benchmark with Speculative Decoding**
327335

328336
``` bash
329337
VLLM_WORKER_MULTIPROC_METHOD=spawn \
@@ -347,7 +355,7 @@ Total num prompt tokens: 261136
347355
Total num output tokens: 204800
348356
```
349357

350-
### Other HuggingFaceDataset Examples
358+
**Other HuggingFaceDataset Examples**
351359

352360
**`lmms-lab/LLaVA-OneVision-Data`**
353361

@@ -386,7 +394,7 @@ python3 benchmarks/benchmark_throughput.py \
386394
--num-prompts 10
387395
```
388396

389-
### Benchmark with LoRA Adapters
397+
**Benchmark with LoRA Adapters**
390398

391399
``` bash
392400
# download dataset
@@ -403,18 +411,22 @@ python3 vllm/benchmarks/benchmark_throughput.py \
403411
--lora-path yard1/llama-2-7b-sql-lora-test
404412
```
405413

406-
---
407-
## Example - Structured Output Benchmark
414+
</details>
415+
416+
<details>
417+
<summary><b>🛠️ Example - Structured Output Benchmark</b></summary>
418+
419+
<br/>
408420

409421
Benchmark the performance of structured output generation (JSON, grammar, regex).
410422

411-
### Server Setup
423+
**Server Setup**
412424

413425
```bash
414426
vllm serve NousResearch/Hermes-3-Llama-3.1-8B --disable-log-requests
415427
```
416428

417-
### JSON Schema Benchmark
429+
**JSON Schema Benchmark**
418430

419431
```bash
420432
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -426,7 +438,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
426438
--num-prompts 1000
427439
```
428440

429-
### Grammar-based Generation Benchmark
441+
**Grammar-based Generation Benchmark**
430442

431443
```bash
432444
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -438,7 +450,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
438450
--num-prompts 1000
439451
```
440452

441-
### Regex-based Generation Benchmark
453+
**Regex-based Generation Benchmark**
442454

443455
```bash
444456
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -449,7 +461,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
449461
--num-prompts 1000
450462
```
451463

452-
### Choice-based Generation Benchmark
464+
**Choice-based Generation Benchmark**
453465

454466
```bash
455467
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -460,7 +472,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
460472
--num-prompts 1000
461473
```
462474

463-
### XGrammar Benchmark Dataset
475+
**XGrammar Benchmark Dataset**
464476

465477
```bash
466478
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -471,12 +483,16 @@ python3 benchmarks/benchmark_serving_structured_output.py \
471483
--num-prompts 1000
472484
```
473485

474-
---
475-
## Example - Long Document QA Throughput Benchmark
486+
</details>
487+
488+
<details>
489+
<summary><b>📚 Example - Long Document QA Benchmark</b></summary>
490+
491+
<br/>
476492

477493
Benchmark the performance of long document question-answering with prefix caching.
478494

479-
### Basic Long Document QA Test
495+
**Basic Long Document QA Test**
480496

481497
```bash
482498
python3 benchmarks/benchmark_long_document_qa_throughput.py \
@@ -488,7 +504,7 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
488504
--repeat-count 5
489505
```
490506

491-
### Different Repeat Modes
507+
**Different Repeat Modes**
492508

493509
```bash
494510
# Random mode (default) - shuffle prompts randomly
@@ -519,12 +535,16 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
519535
--repeat-mode interleave
520536
```
521537

522-
---
523-
## Example - Prefix Caching Benchmark
538+
</details>
539+
540+
<details>
541+
<summary><b>🗂️ Example - Prefix Caching Benchmark</b></summary>
542+
543+
<br/>
524544

525545
Benchmark the efficiency of automatic prefix caching.
526546

527-
### Fixed Prompt with Prefix Caching
547+
**Fixed Prompt with Prefix Caching**
528548

529549
```bash
530550
python3 benchmarks/benchmark_prefix_caching.py \
@@ -535,7 +555,7 @@ python3 benchmarks/benchmark_prefix_caching.py \
535555
--input-length-range 128:256
536556
```
537557

538-
### ShareGPT Dataset with Prefix Caching
558+
**ShareGPT Dataset with Prefix Caching**
539559

540560
```bash
541561
# download dataset
@@ -550,12 +570,16 @@ python3 benchmarks/benchmark_prefix_caching.py \
550570
--input-length-range 128:256
551571
```
552572

553-
---
554-
## Example - Request Prioritization Benchmark
573+
</details>
574+
575+
<details>
576+
<summary><b>⚡ Example - Request Prioritization Benchmark</b></summary>
577+
578+
<br/>
555579

556580
Benchmark the performance of request prioritization in vLLM.
557581

558-
### Basic Prioritization Test
582+
**Basic Prioritization Test**
559583

560584
```bash
561585
python3 benchmarks/benchmark_prioritization.py \
@@ -566,7 +590,7 @@ python3 benchmarks/benchmark_prioritization.py \
566590
--scheduling-policy priority
567591
```
568592

569-
### Multiple Sequences per Prompt
593+
**Multiple Sequences per Prompt**
570594

571595
```bash
572596
python3 benchmarks/benchmark_prioritization.py \
@@ -577,3 +601,5 @@ python3 benchmarks/benchmark_prioritization.py \
577601
--scheduling-policy priority \
578602
--n 2
579603
```
604+
605+
</details>

0 commit comments

Comments
 (0)