Skip to content

[UR][CUDA] enqueue//enqueue-test/urEnqueueKernelLaunchIncrementMultiDeviceTest fails #19033

Open
@yingcong-wu

Description

@yingcong-wu

Describe the bug

CI job Unified Runtime Pre Commit / Adapters (CUDA, UR_CUDA, -u 1001 --privileged --cap-add SYS_ADM failed and still fails for multiple rerun.

Failures:

FAIL: Unified Runtime Conformance :: enqueue//enqueue-test/24/31 (101 of 492)
******************** TEST 'Unified Runtime Conformance :: enqueue//enqueue-test/24/31' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/__w/llvm/llvm/build/test/conformance/enqueue/enqueue-test-Unified Runtime Conformance-6017-24-31.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=31 GTEST_SHARD_INDEX=24 /__w/llvm/llvm/build/test/conformance/enqueue/enqueue-test
--

Script:
--
/__w/llvm/llvm/build/test/conformance/enqueue/enqueue-test --gtest_filter=urEnqueueKernelLaunchIncrementMultiDeviceTest.Success/UR_BACKEND_CUDA__NVIDIA_CUDA_BACKEND_ID0__NoUseEventWaitRunBackgroundCheck
--
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 3
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 4
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 5
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 6
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 7
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 8
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 9
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:330: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 3

/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 3
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 4
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 5
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 6
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 7
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 8
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 9
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:330
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 3


********************
Testing:  0.. 10.. 
FAIL: Unified Runtime Conformance :: enqueue//enqueue-test/23/31 (102 of 492)
******************** TEST 'Unified Runtime Conformance :: enqueue//enqueue-test/23/31' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/__w/llvm/llvm/build/test/conformance/enqueue/enqueue-test-Unified Runtime Conformance-6017-23-31.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=31 GTEST_SHARD_INDEX=23 /__w/llvm/llvm/build/test/conformance/enqueue/enqueue-test
--

Script:
--
/__w/llvm/llvm/build/test/conformance/enqueue/enqueue-test --gtest_filter=urEnqueueKernelLaunchIncrementMultiDeviceTest.Success/UR_BACKEND_CUDA__NVIDIA_CUDA_BACKEND_ID0__UseEventWaitNoRunBackgroundCheck
--
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:330: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 3

/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:330
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 3


********************
Testing:  0.. 10.. 
FAIL: Unified Runtime Conformance :: enqueue//enqueue-test/22/31 (107 of 492)
******************** TEST 'Unified Runtime Conformance :: enqueue//enqueue-test/22/31' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/__w/llvm/llvm/build/test/conformance/enqueue/enqueue-test-Unified Runtime Conformance-6017-22-31.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=31 GTEST_SHARD_INDEX=22 /__w/llvm/llvm/build/test/conformance/enqueue/enqueue-test
--

Script:
--
/__w/llvm/llvm/build/test/conformance/enqueue/enqueue-test --gtest_filter=urEnqueueKernelLaunchIncrementMultiDeviceTest.Success/UR_BACKEND_CUDA__NVIDIA_CUDA_BACKEND_ID0__UseEventWaitRunBackgroundCheck
--
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 3
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 4
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 5
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 6
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 7
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 8
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 9
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:330: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 3

/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 3
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 4
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 5
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 6
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 7
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 8
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:116
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 9
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:330
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 3


********************
Testing:  0.. 10.. 
FAIL: Unified Runtime Conformance :: enqueue//enqueue-test/25/31 (108 of 492)
******************** TEST 'Unified Runtime Conformance :: enqueue//enqueue-test/25/31' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/__w/llvm/llvm/build/test/conformance/enqueue/enqueue-test-Unified Runtime Conformance-6017-25-31.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=31 GTEST_SHARD_INDEX=25 /__w/llvm/llvm/build/test/conformance/enqueue/enqueue-test
--

Script:
--
/__w/llvm/llvm/build/test/conformance/enqueue/enqueue-test --gtest_filter=urEnqueueKernelLaunchIncrementMultiDeviceTest.Success/UR_BACKEND_CUDA__NVIDIA_CUDA_BACKEND_ID0__NoUseEventWaitNoRunBackgroundCheck
--
/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:330: Failure
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 3

/__w/llvm/llvm/unified-runtime/test/conformance/enqueue/urEnqueueKernelLaunchAndMemcpyInOrder.cpp:330
Expected equality of these values:
  reinterpret_cast<uint32_t *>(SharedMem[i])[j]
    Which is: 2
  ExpectedValue
    Which is: 3


********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. [90](https://github.com/intel/llvm/actions/runs/15703193260/job/44305438571?pr=19009#step:12:91).. 
********************
Failed Tests (4):
  Unified Runtime Conformance :: enqueue//enqueue-test/urEnqueueKernelLaunchIncrementMultiDeviceTest/Success/UR_BACKEND_CUDA__NVIDIA_CUDA_BACKEND_ID0__NoUseEventWaitNoRunBackgroundCheck
  Unified Runtime Conformance :: enqueue//enqueue-test/urEnqueueKernelLaunchIncrementMultiDeviceTest/Success/UR_BACKEND_CUDA__NVIDIA_CUDA_BACKEND_ID0__NoUseEventWaitRunBackgroundCheck
  Unified Runtime Conformance :: enqueue//enqueue-test/urEnqueueKernelLaunchIncrementMultiDeviceTest/Success/UR_BACKEND_CUDA__NVIDIA_CUDA_BACKEND_ID0__UseEventWaitNoRunBackgroundCheck
  Unified Runtime Conformance :: enqueue//enqueue-test/urEnqueueKernelLaunchIncrementMultiDeviceTest/Success/UR_BACKEND_CUDA__NVIDIA_CUDA_BACKEND_ID0__UseEventWaitRunBackgroundCheck

To reproduce

  1. Include code snippet as short as possible
  2. Specify the command which should be used to compile the program
  3. Specify the command which should be used to launch the program
  4. Indicate what is wrong and what was expected

Environment

  • OS: [e.g Windows/Linux]
  • Target device and vendor: [e.g. Nvidia GPU]
  • DPC++ version: [e.g. commit hash or output of clang++ --version]
  • Dependencies version: [e.g. the output of nvidia-smi and sycl-ls --verbose]

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions