Skip to content

RuntimeError: UR error on ADL-N with IPEX 2.5/2.6 using .to(torch.float16) #800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
blue-notes-robot opened this issue Apr 1, 2025 · 5 comments
Assignees
Labels
Crash Execution crashes XPU/GPU XPU/GPU specific issues

Comments

@blue-notes-robot
Copy link

Describe the bug

Problem statement
Executing basic PyTorch operations on an Intel N150 (Alder Lake N "ADL-N") integrated GPU using the xpu device consistently fails with RuntimeError: UR error. This occurs even for fundamental operations like data type conversion (.to(torch.float16)).

Environments

  • Self-compiled IPEX/PyTorch: IPEX v2.5.10+xpu / PyTorch v2.5.1 built from source using the compile_bundle.sh script against the oneAPI 2025.0.2 toolkit (from intel/oneapi-basekit:2025.0.2-0-devel-ubuntu24.04).
  • Official Pre-built Docker Images: intel/intel-extension-for-pytorch:2.5.10-xpu and intel/intel-extension-for-pytorch:2.6.10-xpu.

MWE
The following minimal command reliably reproduces the error inside the affected Docker environments (both custom-built and official) when run with access to the host GPU:

SYCL_UR_TRACE=1 python -c "import torch; import intel_extension_for_pytorch; t = torch.tensor([0], device='xpu'); print(t.to(torch.float16))"

Output
Running the minimal example produces the following output, including UR loader messages and the final traceback. This output is consistent across the self-built environment and the official 2.5.10-xpu / 2.6.10-xpu Docker images.

[W401 10:47:29.656434432 OperatorEntry.cpp:155] Warning: Warning only once for all operators,  other operators may also be overridden.
  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::_cummax_helper(Tensor self, Tensor(a!) values, Tensor(b!) indices, int dim) -> ()
    registered at /ipex/pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
  dispatch key: XPU
  previous kernel: registered at /ipex/pytorch/build/aten/src/ATen/RegisterCPU.cpp:30476
       new kernel: registered at /ipex/intel-extension-for-pytorch/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:2971 (function operator())
<LOADER>[INFO]: loaded adapter 0x0x61763fc88180 (libur_adapter_level_zero.so.0)
<LOADER>[INFO]: loaded adapter 0x0x61763fc89cc0 (libur_adapter_opencl.so.0)
<LOADER>[INFO]: failed to load adapter 'libur_adapter_cuda.so.0' with error: libur_adapter_cuda.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_cuda.so.0' with error: /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_cuda.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter 'libur_adapter_hip.so.0' with error: libur_adapter_hip.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_hip.so.0' with error: /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_hip.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter 'libur_adapter_native_cpu.so.0' with error: libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_native_cpu.so.0' with error: /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: UR error

SYCL_UR_TRACE=-1 reveals:

(.hQueue = 0x585e6230dc20, .hKernel = 0x585e6244def0, .workDim = 1, .pGlobalWorkOffset = 0x585e5f064b40 (0), .pGlobalWorkSize = 0x585e5f064b10 (224), .pLocalWorkSize = 0x585e5f064b28 (224), .numEventsInWaitList = 0, .phEventWaitList = {}, .phEvent = 0x7ffca67e6718 (0x585e6244f6a0)) -> UR_RESULT_ERROR_INVALID_ARGUMENT;`

Might be related to this issue in the pytorch repo but posting here since I compiled via IPEX and not 100% sure about the relevance.

Versions

env.txt

@ZailiWang ZailiWang self-assigned this Apr 1, 2025
@ZailiWang
Copy link
Contributor

Thanks for reporting this, we will take a look and get back to you.

@ZailiWang
Copy link
Contributor

ZailiWang commented Apr 3, 2025

Hi, would you run the command with unitrace and provide the output log?
unitrace can be installed following the guidance here. Basically the steps would be

  • Ensure the oneAPI basekit env variables are activated (via e.g. source /opt/intel/oneapi/setvars.sh)
  • mkdir build && cd build
  • cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_WITH_MPI=0 ..
  • make install

You may need to logon root account in case some authentication error raised like:

CMake Error at cmake_install.cmake:52 (file):
  file INSTALL cannot copy file
  "/test/unitrace/pti-gpu/tools/unitrace/build/unitrace" to
  "/usr/local/bin/unitrace": Success.

After the installation completed, try with command:

NEOReadDebugKeys=1 PrintDebugSettings=1 PrintDebugMessages=1 LogAlignedAllocations=1 LogAllocationMemoryPool=1 LogAllocationType=1 LogAllocationStdout=1 nohup unitrace python -c "import torch;  import intel_extension_for_pytorch; t = torch.tensor([0], device='xpu'); print(t.to(torch.float16))" > unitrace_out.log &

@ZailiWang ZailiWang added XPU/GPU XPU/GPU specific issues Crash Execution crashes labels Apr 3, 2025
@blue-notes-robot
Copy link
Author

Here: unitrace_out.log

@Stonepia
Copy link
Contributor

Stonepia commented Apr 7, 2025

Hi @blue-notes-robot , thanks for the report! I could reproduce it on my local machine. I believe it should be a bug of the driver, and because the device (Gen 12th) iGPU is too old and not in our test matrix, so there might be bugs we didn't track before.

I have submitted the internal track to the driver team. Will contact back when there is update.

@Wetitpig
Copy link

Just to supplement, this error is also reproducible on the CPU Intel Core i7-12700H (also Gen 12th). Most affected are still .to(torch.dtype) conversions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Crash Execution crashes XPU/GPU XPU/GPU specific issues
Projects
None yet
Development

No branches or pull requests

4 participants