Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken tests on RX 6950 XT #670

Open
ffrancesco94 opened this issue Sep 5, 2024 · 4 comments
Open

Broken tests on RX 6950 XT #670

ffrancesco94 opened this issue Sep 5, 2024 · 4 comments

Comments

@ffrancesco94
Copy link

Hi,
I have been having some issues with some downstream packages using AMDGPU.jl, so I was trying to backtrack. I am running on Manjaro with an RX 6950 XT and the ROCm version coming from Arch repositories, version 6.1. I am on Julia 1.10.4 from juliaup and when I `Pkg.test("AMDGPU"), I get the following output:

Test Summary:                               |  Pass  Fail  Error  Broken  Total      Time
AMDGPU                                      | 13173     2      2     151  13328  10m11.5s
  test                                      | 13173     2      2     151  13328          
    test/core_tests.jl                      |   615     1                   616          
      core                                  |   615     1                   616   1m14.8s
        Functional                          |     2                           2      0.1s
        HIPDevice                           |     8                           8      0.0s
        ISA parsing                         |    10                          10      0.0s
        Exception holder                    |                              None      2.1s
        Comparison                          |     3                           3      0.0s
        Synchronization                     |     1                           1      5.3s
        Trapping                            |     2                           2      0.0s
        Base                                |   557     1                   558     55.8s
          Specifying buffer type            |     4                           4      0.0s
          ones/zeros                        |     2                           2      1.2s
          view                              |    10                          10      1.7s
          resize!                           |     3                           3      0.3s
          unsafe_wrap                       |    17                          17      3.9s
          unsafe_free                       |                              None      0.0s
          accumulate                        |    25                          25      6.4s
          Atomics                           |     1                           1      0.3s
          Sorting                           |   384                         384     31.2s
          Reverse kernel                    |    88                          88      2.9s
          Selection                         |     3                           3      1.6s
          Multi-GPU                         |    20     1                    21      3.2s
            Device switching                |     7                           7      0.2s
            Arrays                          |     5                           5      1.0s
            Copying                         |     1                           1      0.8s
            Kernel                          |     1     1                     2      1.0s
            Correctly switching HIP context |     6                           6      0.3s
        broadcast                           |    18                          18      6.4s
        Ref Broadcast                       |     1                           1      0.5s
        Broadcast Fix                       |     2                           2      0.7s
        Broadcast Ref{<:Type}               |     1                           1      0.3s
        Device                              |     3                           3      0.0s
        Stream                              |     7                           7      0.3s
    test/device_tests.jl                    |   473                    9    482          
    test/external_tests.jl                  |    18                          18          
    test/gpuarrays_tests.jl                 |  7213                        7213          
    test/hip_core_tests.jl                  |     4            1              5          
      hip - core                            |     4            1              5      2.3s
        AMDGPU.@elapsed                     |     4                           4      0.6s
        HIP Peer Access                     |                  1              1      0.4s
    test/hip_miopen_tests.jl                |                  1              1          
      hip - MIOpen                          |                  1              1      0.0s
    test/hip_rocblas_tests.jl               |   672     1                   673          
      hip - rocBLAS                         |   672     1                   673   1m08.2s
        BLAS                                |   672     1                   673   1m05.4s
          Build Information                 |     1                           1      0.2s
          Highlevel                         |     2                           2      3.8s
          Level 1                           |    51     1                    52     10.6s
            T = Float32                     |    13                          13      1.0s
            T = Float64                     |    13                          13      0.7s
            T = ComplexF32                  |    12     1                    13      7.5s
            T = ComplexF64                  |    13                          13      1.4s
          Level 2                           |   172                         172     12.7s
          Level 3                           |   446                         446     38.1s
    test/hip_rocfft_tests.jl                |   199                         199          
    test/hip_rocrand_tests.jl               |   141                         141          
    test/hip_rocsolver_tests.jl             |   538                         538          
    test/hip_rocsparse_tests.jl             |  1099                  136   1235          
    test/ka_tests.jl                        |  2201                    6   2207          
ERROR: LoadError: Some tests did not pass: 13173 passed, 2 failed, 2 errored, 151 broken.
in expression starting at /home/fra/.julia/packages/AMDGPU/a1v0k/test/runtests.jl:107
ERROR: Package AMDGPU errored during testing

Is this expected behaviour? I do have an integrated APU (which I don't use at the moment), so it might be why some of the MultiGPU tests are failing.

@pxl-th
Copy link
Member

pxl-th commented Sep 5, 2024

ROCm does not support integrated APU I think, but since it is visible it tries to run multi-gpu tests.
If you hide it with HIP_VISIBLE_DEVICES and some tests still fail, you can share error messages for those

@ffrancesco94
Copy link
Author

Sorry for the long delay! So excluding the multi-GPU tests I get one failed, one with error and 153 broken. The one with error is with MIopen:

MIOpen Error: /usr/src/debug/miopen-hip/MIOpen-rocm-6.0.2/src/ocl/convolutionocl.cpp:129: Invalid filter channel number
MIOpen(HIP): Warning [ValidateGroupCount] NCHWw {10, 4, 10, 10}, x {4, 2, 2, 2}, groups = 2
MIOpen Error: /usr/src/debug/miopen-hip/MIOpen-rocm-6.0.2/src/ocl/convolutionocl.cpp:129: Invalid filter channel number
MIOpen Error: /usr/src/debug/miopen-hip/MIOpen-rocm-6.0.2/src/convolution.cpp:271: Channels do not match for the filter
/usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/stl_vector.h:1144: const_reference std::vector<unsigned long>::operator[](size_type) const [_Tp = unsigned long, _Alloc = std::allocator<unsigned long>]: Assertion '__n < this->size()' failed.

Whereas the failed one is in rocBLAS, with T=ComplexF32.

@pxl-th
Copy link
Member

pxl-th commented Sep 24, 2024

These MIOpen errors are most likely during algorithm search and are not fatal.
It mostly means there's no suitable algorithm for the current backend (OpenCL) so it moves to other backends (ASM, HIP).

For other errors, if you can post full test summary (including stacktraces) that is printed at the end that would be helpful as almost all rocBLAS functions are tested with ComplexF32 type.

@pxl-th
Copy link
Member

pxl-th commented Sep 24, 2024

MIOpen Error: /usr/src/debug/miopen-hip/MIOpen-rocm-6.0.2/src/ocl/convolutionocl.cpp:129: Invalid filter channel number

Actually this was a bug in fwd conv workspace calculation.
Fixed by #678

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants