Open
Description
Describe the bug
Outside of the function the pointer is null, I pass a pointer to this pointer to the function
Inside the function I allocate device memory for a struct, then change a field and then assign a pointer to the allocated struct to the pointer passed to the function.
I then print the field.
Once I leave the function I print the field.
The first print doesn't work but the second does.
I'm not sure why :(
To reproduce
Code Snippet
#include <CL/sycl.hpp>
using namespace cl::sycl;
struct MyStruct {
int value;
};
void manipulateStruct(MyStruct** ptrToDeviceA, queue& q) {
// Allocate new instance of MyStruct in device memory
MyStruct* newDeviceA = malloc_device<MyStruct>(1, q);
// Initialize the new instance with a kernel
q.submit([&](handler& h) {
h.single_task([=]() {
newDeviceA->value = 100;
});
}).wait();
// If ptrToDeviceA is already pointing to a device allocation, free it
if (*ptrToDeviceA != nullptr) {
free(*ptrToDeviceA, q);
}
// Redirect ptrToDeviceA to the new device memory allocation
*ptrToDeviceA = newDeviceA;
size_t bufferSize = 1024;
size_t maxStatementSize = 256;
// This causes an error....
q.submit([&](handler& h) {
stream out(256, 1024, h);
h.single_task([=]() {
out << (*ptrToDeviceA)->value << sycl::endl;
});
}).wait();
}
int main() {
queue q;
// Pointer initially meant for host memory but is nullptr
MyStruct* deviceA = nullptr;
// Use manipulateStruct to allocate and initialize the struct in device memory
manipulateStruct(&deviceA, q);
size_t bufferSize = 1024;
size_t maxStatementSize = 256;
//This is completely fine!
q.submit([&](handler& h) {
stream out(256, 1024, h);
h.single_task([=]() {
out << "This many nnz: " << deviceA->value << sycl::endl;
});
}).wait();
free(deviceA, q);
}
Compiled with
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda test.cpp -o test
./test
Error message
UR CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: urEnqueueMemBufferRead
Source Location: /home/temi/sycl_workspace/llvm/build/_deps/unified-runtime-src/source/adapters/cuda/enqueue.cpp:1576
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
Aborted (core dumped)
Expected
"This many nnz: 100"
Environment
OS: Linux "pop OS"
Device and Vendor: Nvidia, RTX 2080ti
clang version 19.0.0git (https://github.com/intel/llvm db6a05d)
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:01:00.0 Off | N/A |
| 16% 43C P8 20W / 260W | 466MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2672 G /usr/lib/xorg/Xorg 301MiB |
| 0 N/A N/A 2807 G /usr/bin/gnome-shell 19MiB |
| 0 N/A N/A 4083 G firefox 77MiB |
| 0 N/A N/A 8540 G ...yOnDemand --variations-seed-version 26MiB |
| 0 N/A N/A 688820 G ...rker,SpareRendererForSitePerProcess 37MiB |
+---------------------------------------------------------------------------------------+
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 2080 Ti 7.5 [CUDA 12.3]
Platforms: 1
Platform [#1]:
Version : CUDA 12.3
Name : NVIDIA CUDA BACKEND
Vendor : NVIDIA Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 7.5
Name : NVIDIA GeForce RTX 2080 Ti
Vendor : NVIDIA Corporation
Driver : CUDA 12.3
Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthur_print: Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph
info::device::sub_group_sizes: 32
default_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 2080 Ti 7.5 [CUDA 12.3]
accelerator_selector() : No device of requested type available. -1 (PI_ERRO...
cpu_selector() : No device of requested type available. -1 (PI_ERRO...
gpu_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 2080 Ti 7.5 [CUDA 12.3]
custom_selector(gpu) : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 2080 Ti 7.5 [CUDA 12.3]
custom_selector(cpu) : No device of requested type available. -1 (PI_ERRO...
custom_selector(acc) : No device of requested type available. -1 (PI_ERRO...
Additional context
lmk if you need more info