[BUG]: __assert_fail redefinition conflict with LibTorch #2571

osayamenja · 2024-10-13T01:59:53Z

Is this a duplicate?

I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct

Type of Bug

Compile-time Error

Component

libcu++

Describe the bug

We redefine __assert_fail() here but this causes a linkage conflict/error when built with LibTorch that also defines the assertion.

See error below:

error: linkage specification is incompatible with previous "__assert_fail" (declared at line 72 of .../libcudacxx/include/cuda/std/__cccl/assert.h)

My current workaround is to add a header guard condition eliding the definition in torch. I decided to mention this here, since this issue would probably get missed among the many other open issues at PyTorch GitHub.

How to Reproduce

Make a cpp project linking both LibTorch and CCCL, this can be done easily with CMake.

Suggested CMakeLists.txt

# >= 3.24 for 'native' architecture
cmake_minimum_required(VERSION 3.24 FATAL_ERROR)
project(example-app CUDA CXX)

# Set some standard props
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

set(CUDA_HOST_COMPILER ${CMAKE_CXX_COMPILER})
set(CMAKE_CUDA_ARCHITECTURES "native")
set(CMAKE_CUDA_STANDARD 20)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
 
find_package(CUDAToolkit REQUIRED)

# install from https://download.pytorch.org/libtorch/cu124/libtorch-shared-with-deps-2.4.1%2Bcu124.zip
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")

add_executable(example-app example-app.cu)

set_target_properties(example-app PROPERTIES
	POSITION_INDEPENDENT_CODE ON
        CUDA_SEPARABLE_COMPILATION ON
)

target_link_libraries(example-app PRIVATE "${TORCH_LIBRARIES}")

# Assuming CPM.cmake is available
include(cmake/CPM.cmake)
CPMAddPackage(
        NAME CCCL
        GITHUB_REPOSITORY nvidia/cccl
        GIT_TAG main # Fetches the latest commit on the main branch
)
if(CCCL_ADDED)
    target_link_libraries(example-app PRIVATE CCCL::CCCL)
endif()

example-app.cu

#include <torch/torch.h>
#include <cuda/std/array>

int main() {
  cuda::std::array<float, 4> a{{0,1,2,3}};
  const auto t = torch::from_blob(a.data(), {2,2});
  std::cout << t << std::endl;
}

Expected behavior

Code should compile, but this might be a tricky to resolve with torch.

Reproduction link

No response

Operating System

Ubuntu Linux 22.04

nvidia-smi output

Sat Oct 12 20:58:52 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100 80GB PCIe          Off |   00000000:01:00.0 Off |                    0 |
| N/A   36C    P0             52W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          Off |   00000000:25:00.0 Off |                    0 |
| N/A   37C    P0             56W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

NVCC version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:10:22_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0

The text was updated successfully, but these errors were encountered:

fbusato · 2024-10-14T15:40:27Z

technically, the issue is more on the LibTorch side. We could potentially handle __SYCL_DEVICE_ONLY__ to avoid conflicts even if it is not intended to be supported by CUDA, see https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html. @miscco

miscco · 2024-10-14T17:04:41Z

Actually, I believe the issue is the missing host device attribute

miscco · 2024-10-14T17:36:27Z

@osayamenja could you check whether just adding _CCCL_HOST_DEVICE instead of extern would fix the redeclaration issue?

osayamenja · 2024-10-15T22:17:27Z

@miscco fix works! Thanks!

osayamenja added the bug Something isn't working right. label Oct 13, 2024

github-project-automation bot added this to CCCL Oct 13, 2024

github-project-automation bot moved this to Todo in CCCL Oct 13, 2024

osayamenja changed the title ~~[BUG]: __assert_fail redefinition conflict~~ [BUG]: __assert_fail redefinition conflict with LibTorch Oct 13, 2024

miscco mentioned this issue Oct 14, 2024

Try to use the same redefinition of __assert_fail as pytorch has #2577

Merged

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Oct 14, 2024

miscco closed this as completed in #2577 Oct 15, 2024

github-project-automation bot moved this from In Review to Done in CCCL Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: __assert_fail redefinition conflict with LibTorch #2571

[BUG]: __assert_fail redefinition conflict with LibTorch #2571

osayamenja commented Oct 13, 2024 •

edited

Loading

fbusato commented Oct 14, 2024

miscco commented Oct 14, 2024

miscco commented Oct 14, 2024

osayamenja commented Oct 15, 2024

[BUG]: __assert_fail redefinition conflict with LibTorch #2571

[BUG]: __assert_fail redefinition conflict with LibTorch #2571

Comments

osayamenja commented Oct 13, 2024 • edited Loading

Is this a duplicate?

Type of Bug

Component

Describe the bug

How to Reproduce

Expected behavior

Reproduction link

Operating System

nvidia-smi output

NVCC version

fbusato commented Oct 14, 2024

miscco commented Oct 14, 2024

miscco commented Oct 14, 2024

osayamenja commented Oct 15, 2024

osayamenja commented Oct 13, 2024 •

edited

Loading