@@ -4,7 +4,7 @@ This README guides you through running benchmark tests with the extensive
4
4
datasets supported on vLLM. It’s a living document, updated as new features and datasets
5
5
become available.
6
6
7
- ## Dataset Overview
7
+ ** Dataset Overview**
8
8
9
9
<table style =" width :100% ; border-collapse : collapse ;" >
10
10
<thead >
@@ -82,7 +82,10 @@ become available.
82
82
** Note** : HuggingFace dataset's ` dataset-name ` should be set to ` hf `
83
83
84
84
---
85
- ## Example - Online Benchmark
85
+ <details >
86
+ <summary ><b >🚀 Example - Online Benchmark</b ></summary >
87
+
88
+ <br />
86
89
87
90
First start serving your model
88
91
@@ -130,7 +133,8 @@ P99 ITL (ms): 8.39
130
133
==================================================
131
134
```
132
135
133
- ### Custom Dataset
136
+ ** Custom Dataset**
137
+
134
138
If the dataset you want to benchmark is not supported yet in vLLM, even then you can benchmark on it using ` CustomDataset ` . Your data needs to be in ` .jsonl ` format and needs to have "prompt" field per entry, e.g., data.jsonl
135
139
136
140
```
@@ -162,7 +166,7 @@ python3 benchmarks/benchmark_serving.py --port 9001 --save-result --save-detaile
162
166
163
167
You can skip applying chat template if your data already has it by using ` --custom-skip-chat-template ` .
164
168
165
- ### VisionArena Benchmark for Vision Language Models
169
+ ** VisionArena Benchmark for Vision Language Models**
166
170
167
171
``` bash
168
172
# need a model with vision capability here
@@ -180,7 +184,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
180
184
--num-prompts 1000
181
185
```
182
186
183
- ### InstructCoder Benchmark with Speculative Decoding
187
+ ** InstructCoder Benchmark with Speculative Decoding**
184
188
185
189
``` bash
186
190
VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
@@ -197,7 +201,7 @@ python3 benchmarks/benchmark_serving.py \
197
201
--num-prompts 2048
198
202
```
199
203
200
- ### Other HuggingFaceDataset Examples
204
+ ** Other HuggingFaceDataset Examples**
201
205
202
206
``` bash
203
207
vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests
@@ -251,7 +255,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
251
255
--num-prompts 80
252
256
```
253
257
254
- ### Running With Sampling Parameters
258
+ ** Running With Sampling Parameters**
255
259
256
260
When using OpenAI-compatible backends such as ` vllm ` , optional sampling
257
261
parameters can be specified. Example client command:
@@ -269,7 +273,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
269
273
--num-prompts 10
270
274
```
271
275
272
- ### Running With Ramp-Up Request Rate
276
+ ** Running With Ramp-Up Request Rate**
273
277
274
278
The benchmark tool also supports ramping up the request rate over the
275
279
duration of the benchmark run. This can be useful for stress testing the
@@ -284,8 +288,12 @@ The following arguments can be used to control the ramp-up:
284
288
- ` --ramp-up-start-rps ` : The request rate at the beginning of the benchmark.
285
289
- ` --ramp-up-end-rps ` : The request rate at the end of the benchmark.
286
290
287
- ---
288
- ## Example - Offline Throughput Benchmark
291
+ </details >
292
+
293
+ <details >
294
+ <summary ><b >📈 Example - Offline Throughput Benchmark</b ></summary >
295
+
296
+ <br />
289
297
290
298
``` bash
291
299
python3 vllm/benchmarks/benchmark_throughput.py \
@@ -303,7 +311,7 @@ Total num prompt tokens: 5014
303
311
Total num output tokens: 1500
304
312
```
305
313
306
- ### VisionArena Benchmark for Vision Language Models
314
+ ** VisionArena Benchmark for Vision Language Models**
307
315
308
316
``` bash
309
317
python3 vllm/benchmarks/benchmark_throughput.py \
@@ -323,7 +331,7 @@ Total num prompt tokens: 14527
323
331
Total num output tokens: 1280
324
332
```
325
333
326
- ### InstructCoder Benchmark with Speculative Decoding
334
+ ** InstructCoder Benchmark with Speculative Decoding**
327
335
328
336
``` bash
329
337
VLLM_WORKER_MULTIPROC_METHOD=spawn \
@@ -347,7 +355,7 @@ Total num prompt tokens: 261136
347
355
Total num output tokens: 204800
348
356
```
349
357
350
- ### Other HuggingFaceDataset Examples
358
+ ** Other HuggingFaceDataset Examples**
351
359
352
360
** ` lmms-lab/LLaVA-OneVision-Data ` **
353
361
@@ -386,7 +394,7 @@ python3 benchmarks/benchmark_throughput.py \
386
394
--num-prompts 10
387
395
```
388
396
389
- ### Benchmark with LoRA Adapters
397
+ ** Benchmark with LoRA Adapters**
390
398
391
399
``` bash
392
400
# download dataset
@@ -403,18 +411,22 @@ python3 vllm/benchmarks/benchmark_throughput.py \
403
411
--lora-path yard1/llama-2-7b-sql-lora-test
404
412
```
405
413
406
- ---
407
- ## Example - Structured Output Benchmark
414
+ </details >
415
+
416
+ <details >
417
+ <summary ><b >🛠️ Example - Structured Output Benchmark</b ></summary >
418
+
419
+ <br />
408
420
409
421
Benchmark the performance of structured output generation (JSON, grammar, regex).
410
422
411
- ### Server Setup
423
+ ** Server Setup**
412
424
413
425
``` bash
414
426
vllm serve NousResearch/Hermes-3-Llama-3.1-8B --disable-log-requests
415
427
```
416
428
417
- ### JSON Schema Benchmark
429
+ ** JSON Schema Benchmark**
418
430
419
431
``` bash
420
432
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -426,7 +438,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
426
438
--num-prompts 1000
427
439
```
428
440
429
- ### Grammar-based Generation Benchmark
441
+ ** Grammar-based Generation Benchmark**
430
442
431
443
``` bash
432
444
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -438,7 +450,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
438
450
--num-prompts 1000
439
451
```
440
452
441
- ### Regex-based Generation Benchmark
453
+ ** Regex-based Generation Benchmark**
442
454
443
455
``` bash
444
456
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -449,7 +461,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
449
461
--num-prompts 1000
450
462
```
451
463
452
- ### Choice-based Generation Benchmark
464
+ ** Choice-based Generation Benchmark**
453
465
454
466
``` bash
455
467
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -460,7 +472,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
460
472
--num-prompts 1000
461
473
```
462
474
463
- ### XGrammar Benchmark Dataset
475
+ ** XGrammar Benchmark Dataset**
464
476
465
477
``` bash
466
478
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -471,12 +483,16 @@ python3 benchmarks/benchmark_serving_structured_output.py \
471
483
--num-prompts 1000
472
484
```
473
485
474
- ---
475
- ## Example - Long Document QA Throughput Benchmark
486
+ </details >
487
+
488
+ <details >
489
+ <summary ><b >📚 Example - Long Document QA Benchmark</b ></summary >
490
+
491
+ <br />
476
492
477
493
Benchmark the performance of long document question-answering with prefix caching.
478
494
479
- ### Basic Long Document QA Test
495
+ ** Basic Long Document QA Test**
480
496
481
497
``` bash
482
498
python3 benchmarks/benchmark_long_document_qa_throughput.py \
@@ -488,7 +504,7 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
488
504
--repeat-count 5
489
505
```
490
506
491
- ### Different Repeat Modes
507
+ ** Different Repeat Modes**
492
508
493
509
``` bash
494
510
# Random mode (default) - shuffle prompts randomly
@@ -519,12 +535,16 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
519
535
--repeat-mode interleave
520
536
```
521
537
522
- ---
523
- ## Example - Prefix Caching Benchmark
538
+ </details >
539
+
540
+ <details >
541
+ <summary ><b >🗂️ Example - Prefix Caching Benchmark</b ></summary >
542
+
543
+ <br />
524
544
525
545
Benchmark the efficiency of automatic prefix caching.
526
546
527
- ### Fixed Prompt with Prefix Caching
547
+ ** Fixed Prompt with Prefix Caching**
528
548
529
549
``` bash
530
550
python3 benchmarks/benchmark_prefix_caching.py \
@@ -535,7 +555,7 @@ python3 benchmarks/benchmark_prefix_caching.py \
535
555
--input-length-range 128:256
536
556
```
537
557
538
- ### ShareGPT Dataset with Prefix Caching
558
+ ** ShareGPT Dataset with Prefix Caching**
539
559
540
560
``` bash
541
561
# download dataset
@@ -550,12 +570,16 @@ python3 benchmarks/benchmark_prefix_caching.py \
550
570
--input-length-range 128:256
551
571
```
552
572
553
- ---
554
- ## Example - Request Prioritization Benchmark
573
+ </details >
574
+
575
+ <details >
576
+ <summary ><b >⚡ Example - Request Prioritization Benchmark</b ></summary >
577
+
578
+ <br />
555
579
556
580
Benchmark the performance of request prioritization in vLLM.
557
581
558
- ### Basic Prioritization Test
582
+ ** Basic Prioritization Test**
559
583
560
584
``` bash
561
585
python3 benchmarks/benchmark_prioritization.py \
@@ -566,7 +590,7 @@ python3 benchmarks/benchmark_prioritization.py \
566
590
--scheduling-policy priority
567
591
```
568
592
569
- ### Multiple Sequences per Prompt
593
+ ** Multiple Sequences per Prompt**
570
594
571
595
``` bash
572
596
python3 benchmarks/benchmark_prioritization.py \
@@ -577,3 +601,5 @@ python3 benchmarks/benchmark_prioritization.py \
577
601
--scheduling-policy priority \
578
602
--n 2
579
603
```
604
+
605
+ </details >
0 commit comments