-
Notifications
You must be signed in to change notification settings - Fork 271
[Lunar Lake] UR_RESULT_ERROR_DEVICE_LOST #780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@rpolyano |
@rpolyano |
@rpolyano |
Hi Louie, sorry for the delayed response. Just to note: I am on an Asus Zenbook S14 - an Intel Lunar Lake laptop. xpu-smi does not seem to work:
However
So it seems python can see it, but xpu-smi cannot. I do not have CUDA on the same machine. |
IPEX 2.6.1 did not solve this issue, and I think enough time has passed between an IPEX release and the release of Lunar Lake to warrant trying to get this solved properly. Running the default workflow in ComfyUI with Stable Diffusion 1.5 yields a similar error.
Seems odd that this can just happen like that.
|
If I use the Nightly Pytorch wheel intended for BMG on Linux at #764 (comment) on my LNL laptop, the device lost issue seems to have been fixed, but Stable Diffusion 1.5 default workflow fails with either a black image generated or a garbage image. So some more work left to be done since digging some more into the issue here, Lunar Lake still isn't officially supported by Pytorch officially in any capacity on Ubuntu 24.04 which I have and which may be the cause but doubtful. Not sure if it is good enough to run LLMs, I don't have the model the issue opener used so can not test. |
I have this problem, too.
|
IPEX2.6 &2.5 produces same error. Only flux model has this problem currently. Is it driver related or application related?
|
This might be a known issue. When the system memory is under high pressure, it might cause this UR Error. You could witness that the system memory is almost full, then you get this UR Error. We have reported to the driver team for the fix. For now, I would suggest to:
Related: intel/torch-xpu-ops#1324 |
In the example I gave, I am using Stable Diffusion 1.5 with the default workflow of generating 1 image and it is still failing with this error. It is about as lightweight of a model and example of image diffusion you can run. The LNL laptop I use has 32GB of RAM of which half is allocated as VRAM to the iGPU. It doesn't make sense to me that it would be failing for that reason. |
Well, for SD1.5, then there is no reason it should fail with this error. Could you take a look at how much memory it uses when the model runs? Seems that you are using Linux, you may use
If you are using Windows, you could simply see from the task manager. |
Lunar Lake does not work with |
Thanks for the patience~! Honestly I don't have a good idea of the clue, but let's start from the reproducer in this thread. Meanwhile, if you have a reproducer, it is welcome to post and we could have a try locally. |
@Stonepia I found out I wasn't running the latest torch nightly for another unrelated issue for my LNL laptop and because of that, while doing testing, I found out that using torch nightly 2.8.0.dev20250321+xpu fixes the issue with the lost device while running with normal workloads. The only issue left is the one you linked about UR error when you use too much memory, the system does not handle itself gracefully and sometimes hard locks. |
this should be the known issue. It is because when the memory pressure is too high, the context will be broken. In extreme cases, it will result in the UR_ERROR_DEVICE_LOST.
An internal issue has been created, and will keep you posted |
Describe the bug
Trying to load the
openbmb/MiniCPM-o-2_6
model results inIf I add
.eval()
after.model()
it fully crashes my entire desktop, and sends me back to login screen.I have also tried this in the
docker.io/intel/intel-extension-for-pytorch:2.5.10-xpu
docker container, same result.Full code snippet:
Versions
Traceback (most recent call last):
File .../collect_env.py", line 19, in
import intel_extension_for_pytorch as ipex
File "/home/roman/.local/share/virtualenvs/nightingale-uqI8m8sk/lib/python3.12/site-packages/intel_extension_for_pytorch/init.py", line 147, in
from . import _dynamo
File "/home/roman/.local/share/virtualenvs/nightingale-uqI8m8sk/lib/python3.12/site-packages/intel_extension_for_pytorch/_dynamo/init.py", line 4, in
from torch._inductor.compile_fx import compile_fx
File "/home/roman/.local/share/virtualenvs/nightingale-uqI8m8sk/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 49, in
from torch._inductor.debug import save_args_for_compile_fx_inner
File "/home/roman/.local/share/virtualenvs/nightingale-uqI8m8sk/lib/python3.12/site-packages/torch/_inductor/debug.py", line 26, in
from . import config, ir # noqa: F811, this is needed
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/roman/.local/share/virtualenvs/nightingale-uqI8m8sk/lib/python3.12/site-packages/torch/_inductor/ir.py", line 77, in
from .runtime.hints import ReductionHint
File "/home/roman/.local/share/virtualenvs/nightingale-uqI8m8sk/lib/python3.12/site-packages/torch/_inductor/runtime/hints.py", line 36, in
attr_desc_fields = {f.name for f in fields(AttrsDescriptor)}
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/roman/.pyenv/versions/3.12.8/lib/python3.12/dataclasses.py", line 1289, in fields
raise TypeError('must be called with a dataclass type or instance') from None
TypeError: must be called with a dataclass type or instance
The text was updated successfully, but these errors were encountered: