Skip to content

Intel Pytorch on Intel(R) UHD Graphics #802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jmurel opened this issue Apr 4, 2025 · 2 comments
Open

Intel Pytorch on Intel(R) UHD Graphics #802

jmurel opened this issue Apr 4, 2025 · 2 comments
Assignees

Comments

@jmurel
Copy link

jmurel commented Apr 4, 2025

Describe the issue

I am trying to conduct language model inference using a computer that has an Intel processor. I have loading pytorch for Intel (along with the Intel Driver and DeepLearning tools) following torch and intel instructions:

Getting Started on Intel GPU — PyTorch 2.6 documentation

PyTorch Prerequisites for Intel® GPUs

This is the processor on my computer:

13th Gen Intel(R) Core(TM) i5-1335U, 1300 Mhz, 10 Core(s), 12 Logical Processor(s)

I run this script:

import torch
import intel_extension_for_pytorch
import logging
import platform

def load_device(model: torch.nn.Module):

logging.info(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
logging.info(f"CUDA version: {torch.version.cuda}")
device = torch.device("cuda")
elif platform.system() == "Darwin" and torch.backends.mps.is_available():
logging.info("MPS Metal available")
device = torch.device("mps")
elif hasattr(torch, 'xpu') and torch.xpu.is_available():
logging.info("Intel XPU available")
device = torch.device("xpu")
else:
device = torch.device("cpu")

logging.info(f"Using device {device}.")
model.to(device)
return device, model

My log then states that is "Using device xpu". This leaves me to assume that the issue is not with pytorch of my processor.

The error arises when I load the model + device to my masked language modeling inference code. Here is my inference code:

import torch
from collections import defaultdict
import logging
import numpy as np


def prediction_function(
    text: str,
    model,
    tokenizer,
    device,
    window_size: int = 512,
    overlap: int = 128,
    num_predictions: int = 5,
):

    all_predictions = defaultdict(list)
    tokens = tokenizer.encode(text, add_special_tokens=False)
    num_tokens = len(tokens)

    # overlapping window loop to process text beyond 512 tokens
    for i in range(0, num_tokens, window_size - overlap):
        chunk_ids = tokens[i : min(i + window_size, num_tokens)]
        chunk_ids = chunk_ids[:512]
        chunk = tokenizer.decode(chunk_ids)
        chunk_inputs = tokenizer(
            chunk,
            return_tensors="pt",
            return_attention_mask=True,
            add_special_tokens=True,
            truncation=True,
            max_length=512,
        )

        chunk_inputs = {k: v.to(device) for k, v in chunk_inputs.items()}

        with torch.no_grad():
            outputs = model(**chunk_inputs)
            predictions = outputs.logits

        masked_indices = [
            i
            for i, token_id in enumerate(chunk_inputs["input_ids"][0])
            if token_id == tokenizer.mask_token_id
        ]
        logging.info(masked_indices)

        for masked_index in masked_indices:
            predicted_probs = predictions[0, masked_index]
            sorted_preds, sorted_idx = torch.sort(predicted_probs, descending=True)
            masked_predictions = []
            for k in range(num_predictions):
                predicted_index = int(sorted_idx[k].item())
                predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
                probability = torch.softmax(predicted_probs, dim=-1)[
                    predicted_index
                ].item()
                masked_predictions.append((predicted_token, probability))
            logging.info(f"Predictions for {masked_index}: {masked_predictions}")
            all_predictions[masked_index + i].extend(masked_predictions)
    logging.info(f"All predictions: {all_predictions}")

    final_predictions = {}
    for masked_index, prediction_list in all_predictions.items():
        # group subword predictions
        subword_groups: dict = {}
        for token, prob in prediction_list:
            if token.startswith("##"):
                base_word = token[2:]  # remove "##" prefix
                if base_word not in subword_groups:
                    subword_groups[base_word] = []
                subword_groups[base_word].append((token, prob))
            else:  # whole word token
                subword_groups[token] = [(token, prob)]
        logging.info(f"Subword groups: {subword_groups}")

        whole_word_predictions = []
        for base_word, subword_list in subword_groups.items():
            max_prob = 0.0
            for subtoken, prob in subword_list:
                if prob > max_prob:
                    max_prob = prob

            whole_word_predictions.append((base_word, max_prob))

        # sort by prob
        sorted_predictions = sorted(
            whole_word_predictions, key=lambda x: x[1], reverse=True
        )
        # keep top num_predictions
        final_predictions[masked_index] = sorted_predictions[:num_predictions]

    logging.info(type(final_predictions))
    logging.info(f"Final predictions: {final_predictions}")

    return final_predictions

I routinely receive some form of runtime error, such as:

Traceback (most recent call last):
File "C:\Users\jm9095\logion-app\src\backend\main.py", line 181, in prediction_endpoint
results = predict.prediction_function(
text,
...<5 lines>...
num_predictions=5,
)
File "C:\Users\jm9095\logion-app\src\backend\prediction\predict.py", line 59, in prediction_function
outputs = model(**chunk_inputs)
File "C:\Users\jm9095\AppData\Local\Programs\Python\Python313\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users\jm9095\AppData\Local\Programs\Python\Python313\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\jm9095\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\bert\modeling_bert.py", line 1461, in forward
outputs = self.bert(
input_ids,
...<9 lines>...
return_dict=return_dict,
)
File "C:\Users\jm9095\AppData\Local\Programs\Python\Python313\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users\jm9095\AppData\Local\Programs\Python\Python313\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\jm9095\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\bert\modeling_bert.py", line 1108, in forward
extended_attention_mask = _prepare_4d_attention_mask_for_sdpa(
attention_mask, embedding_output.dtype, tgt_len=seq_length
)
File "C:\Users\jm9095\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_attn_mask_utils.py", line 448, in _prepare_4d_attention_mask_for_sdpa
if not is_tracing and torch.all(mask == 1):
~~~~~~~~~^^^^^^^^^^^
RuntimeError: Native API failed. Native API returns: 20 (UR_RESULT_ERROR_DEVICE_LOST)

Or I most recently encountered this error without making any changes to my code or system:

Rraceback (most recent call last):
File "C:\Users\jm9095\logion-app\src\backend\main.py", line 181, in prediction_endpoint
results = predict.prediction_function(
text,
...<5 lines>...
num_predictions=5,
)
File "C:\Users\jm9095\logion-app\src\backend\prediction\predict.py", line 74, in prediction_function
predicted_index = int(sorted_idx[k].item())
~~~~~~~~~~~~~~~~~~^^
RuntimeError: Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)

I wondered whether the issue is because of the Intel driver, as I do not encounter this problem with other graphics processors (e.g. Nvidia, M-chip) or when using my PC's CPUs. That is, the issue only arises when trying to load the xpu device. But given that pytorch does load the xpu device, I do not know whether this is correct. If not, might anyone have suggestions on the source of the runtime error?

@feng-intel feng-intel self-assigned this Apr 7, 2025
@feng-intel
Copy link

feng-intel commented Apr 7, 2025

It looks your platform is windows + cpu with iGPU.
Could you follow this page, https://pytorch-extension.intel.com/installation?platform=gpu&version=v2.6.10%2Bxpu&os=windows&package=pip, and check by

python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"

@jmurel
Copy link
Author

jmurel commented Apr 8, 2025

@feng-intel This is the output when I check:

[W408 09:36:25.000000000 OperatorEntry.cpp:161] Warning: Warning only once for all operators,  other operators may also be overridden.
  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
    registered at C:\actions-runner\_work\pytorch\pytorch\pytorch\build\aten\src\ATen\RegisterSchema.cpp:6
  dispatch key: XPU
  previous kernel: registered at C:\actions-runner\_work\pytorch\pytorch\pytorch\build\aten\src\ATen\RegisterCPU.cpp:30477
       new kernel: registered at E:\frameworks.ai.pytorch.ipex-gpu\build\Release\csrc\gpu\csrc\aten\generated\ATen\RegisterXPU.cpp:468 (function operator ())
2.6.0.post0+xpu
[0]: _XpuDeviceProperties(name='Intel(R) UHD Graphics', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.32413', total_memory=7102MB, max_compute_units=80, gpu_eu_count=80, gpu_subslice_count=10, max_work_group_size=512, max_num_sub_groups=64, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants