integrate aiter #516

fsx950223 · 2025-04-18T07:39:52Z

Please direct your PRs to the upstream vllm (https://github.com/vllm-project/vllm.git)

Accepting PRs into the ROCm fork (https://github.com/ROCm/vllm) will require a clear previously communicated exception

Only works with prebuilt aiter, PREBUILD_KERNELS=1 python setup.py develop.

VLLM_ROCM_USE_AITER=1 VLLM_USE_V1=1 vllm serve /models/models--amd--Llama-3.1-405B-Instruct-FP8-KV/snapshots/2505537398e7cfda52f6d666f315c03db8e4697c/ --tensor-parallel-size 8 --gpu-memory-utilization 0.9 --trust-remote-code --disable-log-requests --block-size 128 --max-model-len 32768 --dtype float16 --quantization fp8 --no-enable-prefix-caching

shajrawi

Can you please direct the PR to the upstream vllm (https://github.com/vllm-project/vllm.git) instead of rocm/vllm ?

tjtanaavllm · 2025-04-21T08:16:35Z

@fsx950223

In your upstream PR, please use direct_register_custom_op from vllm.utils https://github.com/vllm-project/vllm/blob/4c41278b77a8b14fbab6dfeb95d5185c12018fbc/vllm/utils.py#L2002

torch.library.custom_op can have significant overhead because it needs to consider complicated dispatching logic.

"""
    `torch.library.custom_op` can have significant overhead because it
    needs to consider complicated dispatching logic. This function
    directly registers a custom op and dispatches it to the CUDA backend.
    See https://gist.github.com/youkaichao/ecbea9ec9fc79a45d2adce1784d7a9a5
    for more details.

    By default, the custom op is registered to the vLLM library. If you
    want to register it to a different library, you can pass the library
    object to the `target_lib` argument.

    IMPORTANT: the lifetime of the operator is tied to the lifetime of the
    library object. If you want to bind the operator to a different library,
    make sure the library object is alive when the operator is used.
    """

In the upstream, performance gain values and lm_eval has to been attached along side in the PR description.

tjtanaavllm · 2025-04-22T08:52:12Z

@fsx950223 FYI, upstream refers to https://github.com/vllm-project/vllm , and may I know which AITER version are you using?

fsx950223 · 2025-04-22T08:53:56Z

@fsx950223 FYI, upstream refers to https://github.com/vllm-project/vllm , and may I know which AITER version are you using?

8d167e698fb5ecf54d1315e2cae0da6c6a2746b5

tjtanaavllm · 2025-04-22T14:03:33Z

@fsx950223 FYI, upstream refers to https://github.com/vllm-project/vllm , and may I know which AITER version are you using?

8d167e698fb5ecf54d1315e2cae0da6c6a2746b5

Is it from a branch? I couldn't find the commit on the main branch of AITER.

integrate aiter

ffc54c0

fsx950223 requested review from charlifu, mawong-amd, shajrawi, gshtras, maleksan85, sunway513 and hongxiayang as code owners April 18, 2025 07:39

optimize code

262f03e

shajrawi reviewed Apr 18, 2025

View reviewed changes

fix dtype

e74c657

fsx950223 added 2 commits April 21, 2025 09:36

use direct_register_custom_op

1803dcb

optimize performance

bab7a27

use fa directly

2f85f89

fsx950223 force-pushed the aiter branch from 53463b7 to 2f85f89 Compare April 27, 2025 03:35

fsx950223 added 11 commits April 27, 2025 03:41

return directly after prefill attention

a931d85

fix a bug

c66ec19

fix a bug

324256c

fix a bug

bb7d6b2

fix a bug

bad895f

fix a bug

d7256cd

change kernel

9d08b8d

limit grid numbers

9cee332

use pa in triton backend

73a8429

disable flash attention backend

e18b95e

remove duplicate name

c86ee90

fsx950223 added 14 commits April 28, 2025 02:49

debug

be354c7

disable fa

c138e98

fix a bug

6596cae

remove print

6a58de7

move cu seq lens into prepare input

4a081d3

fix bugs

7b8930b

remove d2h copy

a7f85a1

add env

58964c8

fix bugs

51e141a

add Dockerfile

f733de3

use fp8 as output

71d578b

fix bugs

ffe978a

fix bugs

d57e2f5

change api

25e022d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integrate aiter #516

integrate aiter #516

fsx950223 commented Apr 18, 2025 •

edited by github-actions bot

Loading

shajrawi left a comment

tjtanaavllm commented Apr 21, 2025

tjtanaavllm commented Apr 22, 2025

fsx950223 commented Apr 22, 2025

tjtanaavllm commented Apr 22, 2025 •

edited

Loading

integrate aiter #516

Are you sure you want to change the base?

integrate aiter #516

Conversation

fsx950223 commented Apr 18, 2025 • edited by github-actions bot Loading

shajrawi left a comment

Choose a reason for hiding this comment

tjtanaavllm commented Apr 21, 2025

tjtanaavllm commented Apr 22, 2025

fsx950223 commented Apr 22, 2025

tjtanaavllm commented Apr 22, 2025 • edited Loading

fsx950223 commented Apr 18, 2025 •

edited by github-actions bot

Loading

tjtanaavllm commented Apr 22, 2025 •

edited

Loading