Skip to content

integrate aiter #516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

integrate aiter #516

wants to merge 31 commits into from

Conversation

fsx950223
Copy link

@fsx950223 fsx950223 commented Apr 18, 2025

Please direct your PRs to the upstream vllm (https://github.com/vllm-project/vllm.git)

Accepting PRs into the ROCm fork (https://github.com/ROCm/vllm) will require a clear previously communicated exception

Only works with prebuilt aiter, PREBUILD_KERNELS=1 python setup.py develop.

VLLM_ROCM_USE_AITER=1 VLLM_USE_V1=1 vllm serve /models/models--amd--Llama-3.1-405B-Instruct-FP8-KV/snapshots/2505537398e7cfda52f6d666f315c03db8e4697c/ --tensor-parallel-size 8 --gpu-memory-utilization 0.9 --trust-remote-code --disable-log-requests --block-size 128 --max-model-len 32768 --dtype float16 --quantization fp8 --no-enable-prefix-caching

Copy link
Collaborator

@shajrawi shajrawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please direct the PR to the upstream vllm (https://github.com/vllm-project/vllm.git) instead of rocm/vllm ?

@tjtanaavllm
Copy link

@fsx950223

In your upstream PR, please use direct_register_custom_op from vllm.utils https://github.com/vllm-project/vllm/blob/4c41278b77a8b14fbab6dfeb95d5185c12018fbc/vllm/utils.py#L2002

torch.library.custom_op can have significant overhead because it needs to consider complicated dispatching logic.

"""
    `torch.library.custom_op` can have significant overhead because it
    needs to consider complicated dispatching logic. This function
    directly registers a custom op and dispatches it to the CUDA backend.
    See https://gist.github.com/youkaichao/ecbea9ec9fc79a45d2adce1784d7a9a5
    for more details.

    By default, the custom op is registered to the vLLM library. If you
    want to register it to a different library, you can pass the library
    object to the `target_lib` argument.

    IMPORTANT: the lifetime of the operator is tied to the lifetime of the
    library object. If you want to bind the operator to a different library,
    make sure the library object is alive when the operator is used.
    """

In the upstream, performance gain values and lm_eval has to been attached along side in the PR description.

@tjtanaavllm
Copy link

@fsx950223 FYI, upstream refers to https://github.com/vllm-project/vllm , and may I know which AITER version are you using?

@fsx950223
Copy link
Author

@fsx950223 FYI, upstream refers to https://github.com/vllm-project/vllm , and may I know which AITER version are you using?

8d167e698fb5ecf54d1315e2cae0da6c6a2746b5

@tjtanaavllm
Copy link

tjtanaavllm commented Apr 22, 2025

@fsx950223 FYI, upstream refers to https://github.com/vllm-project/vllm , and may I know which AITER version are you using?

8d167e698fb5ecf54d1315e2cae0da6c6a2746b5

Is it from a branch? I couldn't find the commit on the main branch of AITER.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants