Error when print values from reassigned device pointer in function

### Describe the bug

Outside of the function the pointer is null, I pass a pointer to this pointer to the function
Inside the function I allocate device memory for a struct, then change a field and then assign a pointer to the allocated struct to the pointer passed to the function.
I then print the field.
Once I leave the function I print the field.
The first print doesn't work but the second does.
I'm not sure why :(

### To reproduce

### Code Snippet
```
#include <CL/sycl.hpp>
using namespace cl::sycl;
struct MyStruct {
    int value;
};

void manipulateStruct(MyStruct** ptrToDeviceA, queue& q) {
    // Allocate new instance of MyStruct in device memory
    MyStruct* newDeviceA = malloc_device<MyStruct>(1, q);
    
    // Initialize the new instance with a kernel
    q.submit([&](handler& h) {
        h.single_task([=]() {
            newDeviceA->value = 100; 
        });
    }).wait();

    // If ptrToDeviceA is already pointing to a device allocation, free it
    if (*ptrToDeviceA != nullptr) {
        free(*ptrToDeviceA, q);
    }

    // Redirect ptrToDeviceA to the new device memory allocation
    *ptrToDeviceA = newDeviceA;

    size_t bufferSize = 1024;
    size_t maxStatementSize = 256;
    // This causes an error....
    q.submit([&](handler& h) {
    stream out(256, 1024, h);
        h.single_task([=]() {
          out << (*ptrToDeviceA)->value << sycl::endl;
          
        });
    }).wait();
}
int main() {
    queue q;

    // Pointer initially meant for host memory but is nullptr
    MyStruct* deviceA = nullptr;

    // Use manipulateStruct to allocate and initialize the struct in device memory
    manipulateStruct(&deviceA, q);
    size_t bufferSize = 1024;
    size_t maxStatementSize = 256;

    //This is completely fine!
    q.submit([&](handler& h) {
    stream out(256, 1024, h);
        h.single_task([=]() {
          out << "This many nnz: " << deviceA->value << sycl::endl;
         
        });
    }).wait();

    free(deviceA, q);
}
```
### Compiled with 

```
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda test.cpp -o test
./test
```

### Error message

```
UR CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        urEnqueueMemBufferRead
        Source Location: /home/temi/sycl_workspace/llvm/build/_deps/unified-runtime-src/source/adapters/cuda/enqueue.cpp:1576

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
Aborted (core dumped)
```

### Expected 

"This many nnz: 100"

### Environment

OS: Linux "pop OS"
Device and Vendor: Nvidia, RTX 2080ti
clang version 19.0.0git (https://github.com/intel/llvm db6a05d101b990ead474b23a9c8c8ebc6e5710c9)

```
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti     Off | 00000000:01:00.0 Off |                  N/A |
| 16%   43C    P8              20W / 260W |    466MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2672      G   /usr/lib/xorg/Xorg                          301MiB |
|    0   N/A  N/A      2807      G   /usr/bin/gnome-shell                         19MiB |
|    0   N/A  N/A      4083      G   firefox                                      77MiB |
|    0   N/A  N/A      8540      G   ...yOnDemand --variations-seed-version       26MiB |
|    0   N/A  N/A    688820      G   ...rker,SpareRendererForSitePerProcess       37MiB |
+---------------------------------------------------------------------------------------+
```

```
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 2080 Ti 7.5 [CUDA 12.3]

Platforms: 1
Platform [#1]:
    Version  : CUDA 12.3
    Name     : NVIDIA CUDA BACKEND
    Vendor   : NVIDIA Corporation
    Devices  : 1
        Device [#0]:
        Type       : gpu
        Version    : 7.5
        Name       : NVIDIA GeForce RTX 2080 Ti
        Vendor     : NVIDIA Corporation
        Driver     : CUDA 12.3
        Aspects    : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthur_print: Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
 ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph
        info::device::sub_group_sizes: 32
default_selector()      : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 2080 Ti 7.5 [CUDA 12.3]
accelerator_selector()  : No device of requested type available. -1 (PI_ERRO...
cpu_selector()          : No device of requested type available. -1 (PI_ERRO...
gpu_selector()          : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 2080 Ti 7.5 [CUDA 12.3]
custom_selector(gpu)    : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 2080 Ti 7.5 [CUDA 12.3]
custom_selector(cpu)    : No device of requested type available. -1 (PI_ERRO...
custom_selector(acc)    : No device of requested type available. -1 (PI_ERRO...
```

### Additional context

lmk if you need more info

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error when print values from reassigned device pointer in function #13219

Describe the bug

To reproduce

Code Snippet

Compiled with

Error message

Expected

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error when print values from reassigned device pointer in function #13219

Description

Describe the bug

To reproduce

Code Snippet

Compiled with

Error message

Expected

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions