Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

develop branch broken on oneAPI #2124

Closed
fwyzard opened this issue Sep 3, 2023 · 22 comments · Fixed by #2125, #2127 or #2161
Closed

develop branch broken on oneAPI #2124

fwyzard opened this issue Sep 3, 2023 · 22 comments · Fixed by #2125, #2127 or #2161

Comments

@fwyzard
Copy link
Contributor

fwyzard commented Sep 3, 2023

(initially reported by @AuroraPerego)

Hi,
looks like the current HEAD of the develop branch is broken for the SYCL/oneAPI GPU target:

84% tests passed, 7 tests failed out of 43

Total Test time (real) =  70.06 sec

The following tests FAILED:
          8 - parallelLoopPatterns (Subprocess aborted)
         10 - randomCells2D (Failed)
         17 - matMulTest (Failed)
         32 - memBufTest (Not Run)
         33 - bufSlicingTest (Subprocess aborted)
         36 - memViewTest (Subprocess aborted)
         42 - warpTest (Failed)

This is one an Intel Data Center GPU Max 1100 (Ponte Vecchio) with oneAPI DPC++/C++ Compiler 2023.2.0 (or 2023.2.1, there seems to be some confusion with minor versions):

$ git log --oneline -n1
59002235d6d (HEAD -> develop, origin/develop, origin/HEAD) Fix unsigned integer conversion

$ mkdir -p build/sycl_gpu

$ cd build/sycl_gpu

$ CXXFLAGS="-g -O2" cmake \
  -G 'Unix Makefiles' \
  -DCMAKE_CXX_COMPILER=/opt/intel/oneapi/compiler/latest/linux/bin/icpx \
  -DoneDPL_ROOT=/opt/intel/oneapi/dpl/latest \
  -DoneDPL_DIR=/opt/intel/oneapi/dpl/latest/lib/cmake/oneDPL \
  -DTBB_ROOT=/opt/intel/oneapi/tbb/latest \
  -DTBB_DIR=/opt/intel/oneapi/tbb/latest/lib/cmake/tbb \
  -DBOOST_ROOT=~/local/boost/ \
  --log-level=VERBOSE \
  -Dalpaka_DEBUG=2 \
  -Dalpaka_BUILD_EXAMPLES=ON \
  -Dalpaka_CHECK_HEADERS=ON \
  -DBUILD_TESTING=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DCMAKE_VERBOSE_MAKEFILE=ON \
  -Dalpaka_ACC_CPU_B_SEQ_T_SEQ_ENABLE=ON \
  -Dalpaka_ACC_SYCL_ENABLE=ON \
  -Dalpaka_SYCL_PLATFORM_ONEAPI=ON \
  -Dalpaka_SYCL_ONEAPI_CPU=OFF \
  -Dalpaka_SYCL_ONEAPI_GPU=ON \
  -Dalpaka_SYCL_ONEAPI_GPU_DEVICES='intel_gpu_pvc' \
  -Dalpaka_DISABLE_VENDOR_RNG=ON \
  -L \
  ../../

$ make -j8 -k

$ make test

memBufTest

zeroDimBufferTest fails to build with

$ make
...
[ 80%] Building CXX object test/integ/zeroDimBuffer/CMakeFiles/zeroDimBufferTest.dir/src/zeroDimBuffer.cpp.o
cd /home/u106132/alpaka/build/sycl_gpu/test/integ/zeroDimBuffer && /opt/intel/oneapi/compiler/latest/linux/bin/icpx -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED -DALPAKA_ACC_SYCL_ENABLED -DALPAKA_BLOCK_SHARED_DYN_MEMBER_ALLOC_KIB=47 -DALPAKA_DEBUG=2 -DALPAKA_DISABLE_VENDOR_RNG -DALPAKA_OFFLOAD_MAX_BLOCK_SIZE="" -DALPAKA_SYCL_ONEAPI_GPU -DALPAKA_SYCL_TARGET_GPU -DBOOST_ATOMIC_DYN_LINK -DBOOST_ATOMIC_NO_LIB -I/home/u106132/alpaka/include -isystem /home/u106132/alpaka/thirdParty/catch2/src/catch2/.. -isystem /home/u106132/alpaka/build/sycl_gpu/thirdParty/catch2/generated-includes -isystem /home/u106132/local/boost/include -g -O2 -g -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-global-constructors -Wno-padded -Wno-extra-semi-stmt -ffp-model=precise -Wno-disabled-macro-expansion -Wno-unsafe-buffer-usage -O0 -fsycl -sycl-std=2020 -fsycl-targets=intel_gpu_pvc -fsycl-unnamed-lambda -MD -MT test/integ/zeroDimBuffer/CMakeFiles/zeroDimBufferTest.dir/src/zeroDimBuffer.cpp.o -MF CMakeFiles/zeroDimBufferTest.dir/src/zeroDimBuffer.cpp.o.d -o CMakeFiles/zeroDimBufferTest.dir/src/zeroDimBuffer.cpp.o -c /home/u106132/alpaka/test/integ/zeroDimBuffer/src/zeroDimBuffer.cpp
In file included from /home/u106132/alpaka/test/integ/zeroDimBuffer/src/zeroDimBuffer.cpp:9:
In file included from /home/u106132/alpaka/include/alpaka/alpaka.hpp:13:
In file included from /home/u106132/alpaka/include/alpaka/acc/AccCpuOmp2Blocks.hpp:24:
In file included from /home/u106132/alpaka/include/alpaka/workdiv/WorkDivMembers.hpp:8:
/home/u106132/alpaka/include/alpaka/extent/Traits.hpp:92:5: error: static assertion failed due to requirement 'integral_constant<unsigned long, 0>::value >= 1'
    static_assert(Dim<TExtent>::value >= 1);
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~

parallelLoopPatterns

parallelLoopPatterns fails at runtime with an assertion:

$ ./example/parallelLoopPatterns/parallelLoopPatterns
...
parallelLoopPatterns: /home/u106132/alpaka/include/alpaka/mem/buf/sycl/Set.hpp:53: alpaka::detail::TaskSetSyclBase<std::integral_constant<unsigned long, 1>, alpaka::BufGenericSycl<float, std::integral_constant<unsigned long, 1>, unsigned int, alpaka::PlatformGenericSycl<alpaka::detail::IntelGpuSelector>>, alpaka::Vec<std::integral_constant<unsigned long, 1>, unsigned int>>::TaskSetSyclBase(TViewFwd &&, const std::uint8_t &, const TExtent &) [TDim = std::integral_constant<unsigned long, 1>, TView = alpaka::BufGenericSycl<float, std::integral_constant<unsigned long, 1>, unsigned int, alpaka::PlatformGenericSycl<alpaka::detail::IntelGpuSelector>>, TExtent = alpaka::Vec<std::integral_constant<unsigned long, 1>, unsigned int>, TViewFwd = alpaka::BufGenericSycl<float, std::integral_constant<unsigned long, 1>, unsigned int, alpaka::PlatformGenericSycl<alpaka::detail::IntelGpuSelector>> &]: Assertion `m_extentWidthBytes <= m_dstPitchBytes[TDim::value - 1u]' failed.
Aborted (core dumped)

bufSlicingTest

bufSlicingTest fails at runtime with an assertion:

$ ./test/unit/mem/copy/bufSlicingTest
...
bufSlicingTest: /home/u106132/alpaka/include/alpaka/mem/buf/sycl/Set.hpp:53: alpaka::detail::TaskSetSyclBase<std::integral_constant<unsigned long, 1>, alpaka::ViewSubView<alpaka::DevGenericSycl<alpaka::PlatformGenericSycl<alpaka::detail::IntelGpuSelector>>, int, std::integral_constant<unsigned long, 1>, long>, alpaka::Vec<std::integral_constant<unsigned long, 1>, long>>::TaskSetSyclBase(TViewFwd &&, const std::uint8_t &, const TExtent &) [TDim = std::integral_constant<unsigned long, 1>, TView = alpaka::ViewSubView<alpaka::DevGenericSycl<alpaka::PlatformGenericSycl<alpaka::detail::IntelGpuSelector>>, int, std::integral_constant<unsigned long, 1>, long>, TExtent = alpaka::Vec<std::integral_constant<unsigned long, 1>, long>, TViewFwd = alpaka::ViewSubView<alpaka::DevGenericSycl<alpaka::PlatformGenericSycl<alpaka::detail::IntelGpuSelector>>, int, std::integral_constant<unsigned long, 1>, long> &]: Assertion `m_extentWidthBytes <= m_dstPitchBytes[TDim::value - 1u]' failed.
-------------------------------------------------------------------------------
memBufSlicingMemsetTest - TestAccWithDataTypes - 1
-------------------------------------------------------------------------------
/home/u106132/alpaka/test/unit/mem/copy/src/BufSlicing.cpp:170
...............................................................................

/home/u106132/alpaka/test/unit/mem/copy/src/BufSlicing.cpp:170: FAILED:
due to a fatal error condition:
  SIGABRT - Abort (abnormal termination) signal

===============================================================================
test cases:   74 |   49 passed | 25 failed
assertions: 3105 | 3080 passed | 25 failed

Aborted (core dumped)

memViewTest

memViewTest fails at runtime with the same assertion:

$ ./test/unit/mem/view/memViewTest
...
memViewTest: /home/u106132/alpaka/include/alpaka/mem/buf/sycl/Set.hpp:53: alpaka::detail::TaskSetSyclBase<std::integral_constant<unsigned long, 1>, alpaka::ViewPlainPtr<alpaka::DevGenericSycl<alpaka::PlatformGenericSycl<alpaka::detail::IntelGpuSelector>>, float, std::integral_constant<unsigned long, 1>, long>, alpaka::Vec<std::integral_constant<unsigned long, 1>, long>>::TaskSetSyclBase(TViewFwd &&, const std::uint8_t &, const TExtent &) [TDim = std::integral_constant<unsigned long, 1>, TView = alpaka::ViewPlainPtr<alpaka::DevGenericSycl<alpaka::PlatformGenericSycl<alpaka::detail::IntelGpuSelector>>, float, std::integral_constant<unsigned long, 1>, long>, TExtent = alpaka::Vec<std::integral_constant<unsigned long, 1>, long>, TViewFwd = alpaka::ViewPlainPtr<alpaka::DevGenericSycl<alpaka::PlatformGenericSycl<alpaka::detail::IntelGpuSelector>>, float, std::integral_constant<unsigned long, 1>, long> &]: Assertion `m_extentWidthBytes <= m_dstPitchBytes[TDim::value - 1u]' failed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
memViewTest is a Catch2 v3.3.2 host application.
Run with -? for options

-------------------------------------------------------------------------------
viewPlainPtrTest - alpaka::test::TestAccs - 1
-------------------------------------------------------------------------------
/home/u106132/alpaka/test/unit/mem/view/src/ViewPlainPtrTest.cpp:97
...............................................................................

/home/u106132/alpaka/test/unit/mem/view/src/ViewPlainPtrTest.cpp:97: FAILED:
  {Unknown expression after the reported line}
due to a fatal error condition:
  SIGABRT - Abort (abnormal termination) signal

===============================================================================
test cases:   26 |   25 passed | 1 failed
assertions: 1494 | 1493 passed | 1 failed

Aborted (core dumped)

randomCells2D

randomCells2D fails at runtime:

$ ./example/randomCells2D/randomCells2D
...
Number of cells: 26797
Number of calculations per cell: 256
Total number of calculations: 6860032
Mean value A: 0.00628226 (should converge to 0.5)
Mean value B: 0.00628699 (should converge to 0.5)
Maximum error expected at 6860032 calculations should be around 0.0001909
Convergence test failed!
...

$ echo $?
1

matMulTest

matMulTest fails at runtime:

$ ./test/integ/matMul/matMulTest
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
matMulTest is a Catch2 v3.3.2 host application.
Run with -? for options

-------------------------------------------------------------------------------
matMul - TestAccs - 1
-------------------------------------------------------------------------------
/home/u106132/alpaka/test/integ/matMul/src/matMul.cpp:153
...............................................................................

/home/u106132/alpaka/test/integ/matMul/src/matMul.cpp:306: FAILED:
  REQUIRE( resultCorrect )
with expansion:
  false

[+] ~BufCpuImpl
[-] ~BufCpuImpl
===============================================================================
test cases: 2 | 1 passed | 1 failed
assertions: 8 | 7 passed | 1 failed

warpTest

warpTest fails at runtime:

$ ./test/unit/warp/warpTest
...
-------------------------------------------------------------------------------
shfl - alpaka::test::TestAccs - 23
-------------------------------------------------------------------------------
/home/u106132/alpaka/test/unit/warp/src/Shfl.cpp:97
...............................................................................

/home/u106132/alpaka/test/unit/warp/src/Shfl.cpp:134: FAILED:
  REQUIRE( fixture(ShflMultipleThreadWarpTestKernel<16>{}) )
with expansion:
  false

===============================================================================
test cases: 144 |  84 passed |  60 failed
assertions: 732 | 108 passed | 624 failed
@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 3, 2023

As a cross check, I have different failures using

  -DCMAKE_BUILD_TYPE=Release \
  -Dalpaka_DISABLE_VENDOR_RNG=OFF \
88% tests passed, 5 tests failed out of 43

Total Test time (real) =  42.99 sec

The following tests FAILED:
         17 - matMulTest (Failed)
         30 - kernelTest (Failed)
         32 - memBufTest (Not Run)
         33 - bufSlicingTest (Failed)
         36 - memViewTest (Failed)

matMulTest fails with the same message as before.

bufSlicingTest and memViewTest fail, but do not assert. Is it because in release mode the assert are disabled?

kernelTest is new.

parallelLoopPatterns passes. Is it because in release mode the assert are disabled?

randomCells2D does not complain, but it doesn't seem to give a correct result either (see below).

warpTest passes without issues.

bufSlicingTest

bufSlicingTest fails at run time:

$ ./test/unit/mem/copy/bufSlicingTest
...
===============================================================================
test cases:  144 |  114 passed | 30 failed
assertions: 8550 | 8520 passed | 30 failed

memViewTest

memViewTest fails at run time:

$ ./test/unit/mem/view/memViewTest
...
===============================================================================
test cases:  176 |  152 passed | 24 failed
assertions: 2753 | 2729 passed | 24 failed

kernelTest

kernelTest fails at run time:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernelTest is a Catch2 v3.3.2 host application.
Run with -? for options

-------------------------------------------------------------------------------
lambdaKernelWithArgumentIsWorking
-------------------------------------------------------------------------------
/home/u106132/alpaka/test/unit/kernel/src/KernelLambda.cpp:86
...............................................................................

/home/u106132/alpaka/test/unit/kernel/src/KernelLambda.cpp:57: FAILED:
  REQUIRE( fixture(kernel, arg) )
with expansion:
  false

randomCells2D

randomCells2D does not complain, but it doesn't seem to give a correct result either:

$ ./example/randomCells2D/randomCells2D
...
Number of cells: 26797
Number of calculations per cell: 256
Total number of calculations: 6860032
Mean value A: nan (should converge to 0.5)
Mean value B: 0.00628699 (should converge to 0.5)
Maximum error expected at 6860032 calculations should be around 0.0001909
Convergence test passed

@fwyzard fwyzard changed the title develop branch broken on oneAPI GPU develop branch broken on oneAPI Sep 3, 2023
@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 3, 2023

The CPU tests were already broken, but they seem in much worse shape now:

The following tests FAILED:
          5 - kernelSpecialization (SEGFAULT)
          6 - monteCarloIntegration (Subprocess aborted)
         10 - randomCells2D (Failed)
         17 - matMulTest (Failed)
         21 - atomicTest (Failed)
         22 - blockSharedTest (Failed)
         23 - blockSharedSharingTest (Failed)
         24 - blockSyncTest (Failed)
         31 - mathTest (Failed)
         32 - memBufTest (Not Run)
         33 - bufSlicingTest (Failed)
         34 - fenceTest (SEGFAULT)
         36 - memViewTest (Failed)
         39 - randTest (Failed)
         42 - warpTest (Failed)

In particular memBufTest does not even build.

Tested with

$ CXXFLAGS="-g -O2" cmake \
  -G 'Unix Makefiles' \
  -DCMAKE_CXX_COMPILER=/opt/intel/oneapi/compiler/latest/linux/bin/icpx \
  -DoneDPL_ROOT=/opt/intel/oneapi/dpl/latest \
  -DoneDPL_DIR=/opt/intel/oneapi/dpl/latest/lib/cmake/oneDPL \
  -DTBB_ROOT=/opt/intel/oneapi/tbb/latest \
  -DTBB_DIR=/opt/intel/oneapi/tbb/latest/lib/cmake/tbb \
  -DBOOST_ROOT=~/local/boost/ \
  --log-level=VERBOSE \
  -Dalpaka_DEBUG=2 \
  -Dalpaka_BUILD_EXAMPLES=ON \
  -Dalpaka_CHECK_HEADERS=ON \
  -DBUILD_TESTING=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_VERBOSE_MAKEFILE=ON \
  -Dalpaka_ACC_CPU_B_SEQ_T_SEQ_ENABLE=ON \
  -Dalpaka_ACC_SYCL_ENABLE=ON \
  -Dalpaka_SYCL_ONEAPI_CPU=ON \
  -Dalpaka_SYCL_ONEAPI_GPU=OFF \
  -Dalpaka_DISABLE_VENDOR_RNG=OFF \
  -L \
  ../../

@AuroraPerego
Copy link
Contributor

The commit that has broken the tests is this, related to #2093.

Before this comment:

  • GPU: 100% tests passed, 0 tests failed out of 29
  • CPU (runtime 2022.14.8.0): 100% tests passed, 0 tests failed out of 29
  • CPU (latest runtime): 86% tests passed, 4 tests failed out of 29
    The following tests FAILED:
            7 - atomicTest (Failed)
            8 - blockSharedTest (Failed)
            9 - blockSharedSharingTest (Failed)
           28 - warpTest (Failed)
    

After it:

  • GPU: 86% tests passed, 4 tests failed out of 29

    The following tests FAILED:
            3 - matMulTest (Failed)
           18 - memBufTest (Subprocess aborted)
           19 - bufSlicingTest (Subprocess aborted)
           22 - memViewTest (Subprocess aborted)
    
  • CPU (runtime 2022.14.8.0): 86% tests passed, 4 tests failed out of 29

    The following tests FAILED:
            3 - matMulTest (Failed)
           18 - memBufTest (Subprocess aborted)
           19 - bufSlicingTest (Subprocess aborted)
           22 - memViewTest (Subprocess aborted)
    
  • CPU (latest runtime): 72% tests passed, 8 tests failed out of 29

    The following tests FAILED:
            3 - matMulTest (Failed)
            7 - atomicTest (Failed)
            8 - blockSharedTest (Failed)
            9 - blockSharedSharingTest (Failed)
           18 - memBufTest (Subprocess aborted)
           19 - bufSlicingTest (Subprocess aborted)
           22 - memViewTest (Subprocess aborted)
           28 - warpTest (Failed)
    

Tested both on Intel(R) Data Center GPU Flex 170 and Intel(R) Data Center GPU Max 1100 with Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 with:

cmake     \
  -DCMAKE_CXX_COMPILER=/opt/intel/oneapi/compiler/latest/linux/bin/icpx   \
  -DBUILD_TESTING=ON   \
  -Dalpaka_ACC_SYCL_ENABLE=ON   \
  -Dalpaka_SYCL_ONEAPI_CPU=ON   \
  -DBoost_INCLUDE_DIR=/home/u108035/pixeltrack-standalone/external/boost/include  \
   ../../alpaka

or

cmake     \
  -DCMAKE_CXX_COMPILER=/opt/intel/oneapi/compiler/latest/linux/bin/icpx   \
  -DBUILD_TESTING=ON   \
  -Dalpaka_ACC_SYCL_ENABLE=ON   \
  -Dalpaka_SYCL_ONEAPI_GPU=ON   \
  -Dalpaka_SYCL_ONEAPI_GPU_DEVICES=intel_gpu_pvc  \
  -DBoost_INCLUDE_DIR=/home/u108035/pixeltrack-standalone/external/boost/include  \
   ../../alpaka

@AuroraPerego
Copy link
Contributor

memBufTest
zeroDimBufferTest fails to build with

$ make
...
[ 80%] Building CXX object test/integ/zeroDimBuffer/CMakeFiles/zeroDimBufferTest.dir/src/zeroDimBuffer.cpp.o
cd /home/u106132/alpaka/build/sycl_gpu/test/integ/zeroDimBuffer && /opt/intel/oneapi/compiler/latest/linux/bin/icpx -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED -DALPAKA_ACC_SYCL_ENABLED -DALPAKA_BLOCK_SHARED_DYN_MEMBER_ALLOC_KIB=47 -DALPAKA_DEBUG=2 -DALPAKA_DISABLE_VENDOR_RNG -DALPAKA_OFFLOAD_MAX_BLOCK_SIZE="" -DALPAKA_SYCL_ONEAPI_GPU -DALPAKA_SYCL_TARGET_GPU -DBOOST_ATOMIC_DYN_LINK -DBOOST_ATOMIC_NO_LIB -I/home/u106132/alpaka/include -isystem /home/u106132/alpaka/thirdParty/catch2/src/catch2/.. -isystem /home/u106132/alpaka/build/sycl_gpu/thirdParty/catch2/generated-includes -isystem /home/u106132/local/boost/include -g -O2 -g -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-global-constructors -Wno-padded -Wno-extra-semi-stmt -ffp-model=precise -Wno-disabled-macro-expansion -Wno-unsafe-buffer-usage -O0 -fsycl -sycl-std=2020 -fsycl-targets=intel_gpu_pvc -fsycl-unnamed-lambda -MD -MT test/integ/zeroDimBuffer/CMakeFiles/zeroDimBufferTest.dir/src/zeroDimBuffer.cpp.o -MF CMakeFiles/zeroDimBufferTest.dir/src/zeroDimBuffer.cpp.o.d -o CMakeFiles/zeroDimBufferTest.dir/src/zeroDimBuffer.cpp.o -c /home/u106132/alpaka/test/integ/zeroDimBuffer/src/zeroDimBuffer.cpp
In file included from /home/u106132/alpaka/test/integ/zeroDimBuffer/src/zeroDimBuffer.cpp:9:
In file included from /home/u106132/alpaka/include/alpaka/alpaka.hpp:13:
In file included from /home/u106132/alpaka/include/alpaka/acc/AccCpuOmp2Blocks.hpp:24:
In file included from /home/u106132/alpaka/include/alpaka/workdiv/WorkDivMembers.hpp:8:
/home/u106132/alpaka/include/alpaka/extent/Traits.hpp:92:5: error: static assertion failed due to requirement >'integral_constant<unsigned long, 0>::value >= 1'
  static_assert(Dim<TExtent>::value >= 1);
  ^             ~~~~~~~~~~~~~~~~~~~~~~~~

I do not see it in the tests, but I have seen pixeltrack failing in the same way after this commit

@bernhardmgruber
Copy link
Member

Thx for reporting this! From a visual inspection of the diff in PR #2093 there is a clear problem: No changes were made to the SYCL backend :) And since the change did not break API, the backend kept compiling. I will put up a quick PR with the required changes. However, I don't have a machine to test at the moment, since I am out of town.

@AuroraPerego
Copy link
Contributor

Ok thanks! I can run the tests on the Intel GPU if needed :)

bernhardmgruber added a commit to bernhardmgruber/alpaka that referenced this issue Sep 4, 2023
These are changes missed as part of alpaka-group#2093.

Fixes: alpaka-group#2124
bernhardmgruber added a commit to bernhardmgruber/alpaka that referenced this issue Sep 4, 2023
These are changes missed as part of alpaka-group#2093.

Fixes: alpaka-group#2124
@bernhardmgruber
Copy link
Member

Coming up: #2125. The CI says the changes compile for the SYCL backend. I have not run any tests though. Please give it a try, thx!

@AuroraPerego
Copy link
Contributor

100% tests passed, 0 tests failed out of 29 :)

fwyzard pushed a commit that referenced this issue Sep 4, 2023
These are changes missed as part of #2093.

Fixes: #2124
@fwyzard fwyzard reopened this Sep 5, 2023
@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 5, 2023

The current HEAD is better, but still has issues in debug mode (tested on Ponte Vecchio gpu):

95% tests passed, 2 tests failed out of 43

Total Test time (real) =  78.49 sec

The following tests FAILED:
         32 - memBufTest (Not Run)
         42 - warpTest (Failed)

@AuroraPerego
Copy link
Contributor

Regarding memBufTest, it fails to compile when:

-DCMAKE_BUILD_TYPE=Debug
-Dalpaka_DEBUG=2

are used to build the tests.
This happens since this commit: Simplify extent and offset APIs.

I've not looked into warpTest yet.

bernhardmgruber added a commit to bernhardmgruber/alpaka that referenced this issue Sep 5, 2023
This fixes a compilation error with the SYCL backend in debug mode.

Fixes: alpaka-group#2124
bernhardmgruber added a commit to bernhardmgruber/alpaka that referenced this issue Sep 5, 2023
This fixes a compilation error with the SYCL backend in debug mode.

Fixes: alpaka-group#2124
j-stephan pushed a commit that referenced this issue Sep 5, 2023
This fixes a compilation error with the SYCL backend in debug mode.

Fixes: #2124
@j-stephan j-stephan reopened this Sep 5, 2023
@j-stephan
Copy link
Member

Reopening for the warp test.

@SimeonEhrig
Copy link
Member

With -Dalpaka_DEBUG=2 we trigger the bug in the CI: https://gitlab.com/hzdr/crp/alpaka/-/jobs/5021536224

@bernhardmgruber
Copy link
Member

Great! That bug was fixed in #2127, but good to know that the CI can save us in the future now!

@AuroraPerego
Copy link
Contributor

AuroraPerego commented Sep 5, 2023

Reopening for the warp test.

no idea why, but it is broken since this commit

@bernhardmgruber
Copy link
Member

@j-stephan please have a look at the remaining warpTest, since you have access to an Intel GPU. Thx a lot!

@j-stephan
Copy link
Member

j-stephan commented Sep 12, 2023

I've identified the issue but have no idea how to solve it: The behaviour of SYCL's sub-group (= warp) functionality differs between Debug and Release mode. For example, this code:

    template<typename TDim>
    struct Activemask<warp::WarpGenericSycl<TDim>>
    {
        static auto activemask(warp::WarpGenericSycl<TDim> const& warp) -> std::uint32_t
        {
            auto const sub_group = warp.m_item_warp.get_sub_group();
            auto const mask = sycl::ext::oneapi::group_ballot(sub_group, true);
            std::uint32_t bits = 0;
            mask.extract_bits(bits);
            return bits;
        }
    };

always returns 0xFFFFFFFF in Debug mode. In Release mode, it will set the correct bits for the active work-items, so for example 0xFFFFFF7F or 0x7FFFFFFF. Intel recently introduced the opportunistic_group extension but this hasn't landed (in a useful way) in the stable oneAPI releases yet.

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 12, 2023

Does this mean that sycl::ext::oneapi::group_ballot(sub_group, true) is being used correctly, but it returns the wrong value in Debug mode ?
Or that it is being called incorrectly ?

@j-stephan
Copy link
Member

j-stephan commented Sep 12, 2023

I'm not sure. I think technically it is used the wrong way here since group_ballot requires all work-items to call it. So if some of them return before they reach group_ballot it is illegal.

But then again, the opportunistic_group extension is implemented in almost the same way as shown above...

@bernhardmgruber
Copy link
Member

I'm not sure. I think technically it is used the wrong way here since group_ballot requires all work-items to call it. So if some of them return before they reach group_ballot it is illegal.

I was searching for some documentation for that! Because I was thinking whether we may just trigger some undefined behavior here and group_ballot must be called from all threads in a group. I found this hint on how to migrate from CUDA to SYCL, and it basically says to redesign your algorithm. So I am unsure whether we can implement activemask at all.

But then again, the opportunistic_group extension is implemented in almost the same way as shown above...

Well, if the vendor can hack, maybe so can we? :)
I found the implementation here.

@bernhardmgruber
Copy link
Member

Well, there you have it: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:group-functions

If one work-item in a group calls a group function, then all work-items in that group must call exactly the same function under the same set of conditions --- calling the same function under different conditions (e.g. in different iterations of a loop, or different branches of a conditional statement) results in undefined behavior.

Seems like we cannot implement activemask in SYCL :S

@bernhardmgruber
Copy link
Member

I propose to either make activemask in SYCL return -1 (all bits set), just as Intel's CUDA -> SYCL migration tool suggests, or static_assert inside, telling the user that they need to rewrite their code so it does not depend on activemask.

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 12, 2023

I would favour static_assert.
Making it return -1 is a good way to silently break people's code :-(

bernhardmgruber added a commit to bernhardmgruber/alpaka that referenced this issue Sep 13, 2023
bernhardmgruber added a commit to bernhardmgruber/alpaka that referenced this issue Sep 13, 2023
bernhardmgruber added a commit to bernhardmgruber/alpaka that referenced this issue Sep 13, 2023
bernhardmgruber added a commit that referenced this issue Sep 13, 2023
j-stephan pushed a commit to j-stephan/alpaka that referenced this issue Sep 27, 2023
bernhardmgruber added a commit that referenced this issue Sep 27, 2023
SimeonEhrig pushed a commit that referenced this issue Nov 15, 2023
* Switch to 1.0.0-rc1

* Silence more nvcc warnings

* Add missing ALPAKA_UNREACHABLE
* Silence more warnings

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

* Disable activemask for SYCL

Fixes: #2124

* refactor template order `allocMappedBufIfSupported`

Move template `TPlatform` as the last template. There is no need to provide the platform template signature if we pass the platform as an instance.

* Remove unused aliases

* Only add clang warning flag when supported

* Fix a warning with nvcc

* Workaround gcc warning on uninitialized PlatformCpu

* Fix warnings by clang

* Suppress clang warnings in nvcc generated code

* Pass alpaka_ENABLE_WERROR from environment to CMake

* Disable GCC warning in nvcc generated code

Fixes the following warning:
```
/builds/hzdr/crp/alpaka/test/unit/math/src/Defines.hpp:52:35: error: left operand of comma operator has no effect [-Werror=unused-value]
   52 |                         for(size_t i = 0; i < argsItem.arity_nr; ++i)
      |                          ~~~~~~~~~^~~~~~~~~~
```

* Workaround gcc warning on uninitialized PlatformUniformCudaHipRt

* Fix OpenMP 5.1 atomics

* Workaround clang not recognizing ternay expression
* Implement atomicInc/atomicDec via omp critical

Fixes: #2170

* Add clang-17 to CI

Fixes: #2169

* Rename lambda captures to workaround warnings

clang warns that the names of the captures shadow the outside variables.
E.g.:

QueueUniformCudaHipRt.hpp:215:57: error: declaration shadows a local variable [-Werror,-Wshadow-uncaptured-local]
  215 |                 auto f = queue.m_callbackThread.submit([data = std::move(data)] { data->t(); });
      |                                                         ^

* Exclude clang CUDA Debug builds from the CI

They fail with:
ptxas /tmp/randomStrategies-sm_61-768a17.s, line 14415; fatal   : Parsing error near '.': syntax error
ptxas fatal   : Ptx assembly aborted due to errors
clang++: error: ptxas command failed with exit code 255 (use -v to see invocation)

* Add clang-17 to README.md

CUDA/HIP/SYCL is marked as untested for now.

* Fix clang-format version in docs (#2176)

* Remove unnecessary -fintelfpga flag

* Add changelog for alpaka v1.0.0

* Update author lists

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

* Set release date in changelog

---------

Co-authored-by: Jan Stephan <[email protected]>
Co-authored-by: René Widera <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment