fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu #4428

Superjomn · 2025-05-19T03:15:25Z

PR title

Please write the PR title by following template:

[JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] <summary of this PR>

For example, assume I have a PR hope to support a new feature about cache manager of Jira TRTLLM-1000 ticket, it would be like

[TRTLLM-1000][feat] Support a new feature about cache manager

Description

Please explain the issue and the solution in short.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Superjomn · 2025-05-19T03:15:43Z

/bot run --only-multi-gpu-test

tensorrt-cicd · 2025-05-19T03:21:49Z

PR_Github #5664 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-19T15:41:09Z

PR_Github #5664 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #4137 (Partly Tested) completed with status: 'FAILURE'

Superjomn · 2025-05-20T01:54:32Z

/bot run --only-multi-gpu-test

tensorrt-cicd · 2025-05-20T01:59:55Z

PR_Github #5797 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-20T04:49:14Z

PR_Github #5797 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4244 (Partly Tested) completed with status: 'FAILURE'

Signed-off-by: Superjomn <[email protected]>

Superjomn · 2025-05-20T06:29:26Z

/bot run --only-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-05-20T06:34:55Z

PR_Github #5833 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-20T11:19:12Z

PR_Github #5833 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4272 (Partly Tested) completed with status: 'SUCCESS'

Superjomn · 2025-05-20T12:04:55Z

/bot skip --comment "only affect multi-gpu and the tests passed"

tensorrt-cicd · 2025-05-20T12:10:25Z

PR_Github #5873 [ skip ] triggered by Bot

tensorrt-cicd · 2025-05-20T12:16:12Z

PR_Github #5873 [ skip ] completed with state SUCCESS
Skipping testing for commit 3248652

…VIDIA#4428) * add test Signed-off-by: Superjomn <[email protected]> * fix Signed-off-by: Superjomn <[email protected]> --------- Signed-off-by: Superjomn <[email protected]>

…4529) fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#4428)

Superjomn changed the title ~~fix: trtllm-llmapi-launch on single node single gpu~~ fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu May 19, 2025

kaiyux approved these changes May 19, 2025

View reviewed changes

Superjomn force-pushed the fix-mgmn-1n-1g branch from 35346d1 to df8525c Compare May 20, 2025 01:54

Superjomn added 2 commits May 20, 2025 14:29

add test

8f99b07

Signed-off-by: Superjomn <[email protected]>

fix

3248652

Signed-off-by: Superjomn <[email protected]>

Superjomn force-pushed the fix-mgmn-1n-1g branch from df8525c to 3248652 Compare May 20, 2025 06:29

Superjomn enabled auto-merge (squash) May 20, 2025 12:05

Superjomn merged commit 174c518 into NVIDIA:main May 20, 2025
3 checks passed

Superjomn deleted the fix-mgmn-1n-1g branch May 21, 2025 08:20

Superjomn added a commit that referenced this pull request May 27, 2025

fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#…

1011941

…4529) fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#4428)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu #4428

fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu #4428

Uh oh!

Superjomn commented May 19, 2025

Uh oh!

Superjomn commented May 19, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

Superjomn commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

Superjomn commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

Superjomn commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

Uh oh!

Uh oh!

fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu #4428

fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu #4428

Uh oh!

Conversation

Superjomn commented May 19, 2025

PR title

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

Superjomn commented May 19, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

Superjomn commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

Superjomn commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

Superjomn commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

Uh oh!

Uh oh!