Skip to content

Feature Request: Allow "Best-Effort" Optimization for Custom Models via ipex.llm.optimize on XPUs #807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
unrahul opened this issue Apr 8, 2025 · 4 comments
Assignees
Labels
Feature LLM XPU/GPU XPU/GPU specific issues

Comments

@unrahul
Copy link

unrahul commented Apr 8, 2025

Describe the issue

Feature Request: Allow "Best-Effort" Optimization for Custom Models via ipex.llm.optimize

Motivation:
The ipex.llm.optimize API is a powerful API for accelerated inference on supported LLM families on xpus . However, its current design seems tightly coupled to these specific, verified architectures and ipex.optimize is often limited.

Problem Description:
When working with custom decoder models face challenges when trying to leverage ipex.llm.optimize.
These models include:

  • Smaller, domain-specific decoders tailored for particular tasks.
  • Decoder components within larger Vision Language Models (VLMs).
  • Novel architectures developed during research.

Currently, applying ipex.llm.optimize to such models often requires non-trivial workarounds, such as modifying the model's config.json or using monkey-patching techniques to make the model appear as one of the supported types. This process is indirect, adds development overhead, and isn't guaranteed to apply optimizations correctly.

Proposed Solution:
Introduce a pathway for ipex.llm.optimize to apply optimizations on a "best-effort" basis to models not explicitly listed as supported. This could involve:

  1. An Opt-in Mechanism: A boolean flag like attempt_optimization_on_unsupported=True could allow users to explicitly request optimization, acknowledging it might not be fully tuned or guaranteed.
  2. Heuristic-Based Optimization: The optimizer could inspect the provided torch.nn.Module and apply optimizations known to be generally applicable to transformer decoder blocks (e.g., optimizing linear layers, specific activation functions, KV caching if patterns are detected) without relying on exact model family identification.
  3. User Hints (Optional): Potentially allow users to provide basic hints about the model structure if needed (though a fully automatic approach is preferred).

Benefits:

  • Reduced Friction: Lowers the barrier for developers to experiment with IPEX optimizations on custom models.
  • Faster Iteration: Enables quicker testing and deployment of optimized custom architectures.
  • Broader Applicability: Extends the reach and utility of IPEX optimizations beyond the core supported model list.
  • Flexibility: Allows optimizing components (like VLM decoders) independently.

Conclusion:
Providing a mechanism, even if experimental or "best-effort," to apply ipex.llm.optimize to a wider range of decoder-like models would be a valuable addition for the community building and deploying custom AI solutions on Intel hardware.

@wangkl2 wangkl2 added XPU/GPU XPU/GPU specific issues Feature LLM labels Apr 9, 2025
@wangkl2 wangkl2 self-assigned this Apr 9, 2025
@wangkl2
Copy link
Member

wangkl2 commented Apr 9, 2025

@unrahul Thanks for your suggestions about the flexible support of ipex.llm.optimize API on customized models.

  • For models that are not in our support list, theoretically it should function well and invoking ipex.llm.optimize would still apply ipex.optimize to try to get any perf benefits.
  • For CPU, we do provide the guide of leveraging LLM module level optimization APIs in https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/inference/python/llm-modeling to help optimize those private/customized LLMs with commonly used modules or functions via ipex.llm.
  • For XPU, we still keep enabling and adding well-optimized LLMs/model families into our support scope for each release.
  • You can refer to our commits for the support of new LLMs like 94379e2 and c45a3c1 for how to enable your customized models with the optimized transformers. The contributions from the community are warmly welcomed.
  • And please let us know if there is any specific models/model architectures that you hope to be supported/optimized.

@unrahul
Copy link
Author

unrahul commented Apr 9, 2025

hi @wangkl2 , thank you, for example moondream is an interesting model, that I have seen many startups using, and has a lot of benefit in getting optimized using ipex.llm.optimize.

I would still highly request, at least a developer note on how someone can optimize non supported models, as you know there are 100s of models and it can really help developers building custom inference solution to have an example of how one can optimize using building blocks similar to the CPU section on XPUs.

This can avoid non standard wrapper code for optimizations, which for example I am doing now.

For example here is how I am optimizing moondream :

   def _apply_component_specific_optimizations(self, model, ipex_config):
        """
        Apply component-specific optimizations to the model.
        Falls back gracefully if components are not found or errors occur.

        Args:
            model: The loaded model
            ipex_config: IPEX configuration dictionary

        Returns:
            The model with component-specific optimizations applied where possible
        """
        try:
            logger.info(f"Model Name: {type(model).__name__}")
            if hasattr(model, "model") and isinstance(model.model, torch.nn.Module):
                logger.info("Found nested model structure (HfMoondream.model)")
                target_model = model.model
            else:
                logger.info("Using direct model structure")
                target_model = model
            if hasattr(target_model, "text") and isinstance(
                target_model.text, torch.nn.Module
            ):
                logger.info(
                    "Optimizing language model component with ipex.llm.optimize..."
                )
                text_model = target_model.text
                original_config = None
                if hasattr(text_model, "config"):
                    if hasattr(text_model.config, "model_type"):
                        original_config = text_model.config.model_type
                        text_model.config.model_type = "phi"
                    else:
                        text_model.config.model_type = "phi"

                    if (
                        not hasattr(text_model.config, "architectures")
                        or not isinstance(text_model.config.architectures, list)
                        or len(text_model.config.architectures) == 0
                    ):
                        text_model.config.architectures = ["PhiForCausalLM"]
                    if not hasattr(text_model.config, "hidden_size"):
                        for name, module in text_model.named_modules():
                            if isinstance(module, torch.nn.Linear):
                                text_model.config.hidden_size = module.in_features
                                break
                else:
                    text_model.config = PretrainedConfig()
                    text_model.config.model_type = "phi"
                    text_model.config.architectures = ["PhiForCausalLM"]
                added_model_wrapper = False
                if hasattr(text_model, "blocks") and isinstance(
                    text_model.blocks, torch.nn.ModuleList
                ):

                    class ModelWrapper(torch.nn.Module):
                        def __init__(self, blocks):
                            super().__init__()
                            self.layers = blocks

                    text_model.model = ModelWrapper(text_model.blocks)
                    added_model_wrapper = True

                try:
                    target_model.text = ipex.llm.optimize(
                        target_model.text,
                        dtype=self.dtype,
                        device=self.device.type,
                        inplace=True,
                    )
                    logger.info(
                        "Successfully optimized llm component with llm.optimize"
                    )
                    if original_config is not None:
                        target_model.text.config.model_type = original_config
                    if added_model_wrapper and hasattr(target_model.text, "model"):
                        delattr(target_model.text, "model")
                except Exception as e:
                    logger.warning(
                        f"Failed to optimize llm component with ipex.llm.optimize: {str(e)}"
                    )
                    if original_config is not None and hasattr(
                        target_model.text.config, "model_type"
                    ):
                        target_model.text.config.model_type = original_config
                    if added_model_wrapper and hasattr(target_model.text, "model"):
                        delattr(target_model.text, "model")
                    logger.info(
                        "Falling back to standard optimization for this component"
                    )
                    try:
                        target_model.text = ipex.optimize(
                            target_model.text, dtype=self.dtype, **ipex_config
                        )
                        logger.info(
                            "Successfully optimized with standard ipex.optimize"
                        )
                    except Exception as e2:
                        logger.warning(
                            f"Failed to optimize llm component with standard ipex.optimize: {str(e2)}"
                        )
            else:
                logger.warning("llm component not found. Skipping LLM optimization.")
            if hasattr(target_model, "vision") and isinstance(
                target_model.vision, torch.nn.Module
            ):
                try:
                    target_model.vision = ipex.optimize(
                        target_model.vision, dtype=self.dtype, **ipex_config
                    )
                    logger.info("Successfully optimized vision component")
                except Exception as e:
                    logger.warning(f"Failed to optimize vision component: {str(e)}")
            else:
                logger.warning(
                    "Vision component not found. Skipping vision optimization."
                )
            if hasattr(target_model, "region") and isinstance(
                target_model.region, torch.nn.ModuleDict
            ):
                try:
                    for key, component in target_model.region.items():
                        if isinstance(component, torch.nn.Module) and not isinstance(
                            component, torch.nn.Parameter
                        ):
                            try:
                                target_model.region[key] = ipex.optimize(
                                    component, dtype=self.dtype, **ipex_config
                                )
                            except Exception as e:
                                logger.warning(
                                    f"Failed to optimize region component {key}: {str(e)}"
                                )
                        elif isinstance(component, torch.nn.ModuleDict):
                            for sub_key, sub_component in component.items():
                                try:
                                    target_model.region[key][sub_key] = ipex.optimize(
                                        sub_component, dtype=self.dtype, **ipex_config
                                    )
                                except Exception as e:
                                    logger.warning(
                                        f"Failed to optimize region sub_component {key}.{sub_key}: {str(e)}"
                                    )
                    logger.info("Successfully optimized region model components")
                except Exception as e:
                    logger.warning(
                        f"Error during region component optimization: {str(e)}"
                    )
            else:
                logger.warning(
                    "Region components not found. Skipping region optimization."
                )

        except Exception as e:
            logger.warning(f"Error during component-specific optimization: {str(e)}")
        try:
            model = ipex.optimize(model, dtype=self.dtype, **ipex_config)
            logger.info("Successfully applied general IPEX optimization")
        except Exception as e:
            logger.warning(f"Failed to apply general IPEX optimization: {str(e)}")
        return model

It would be awesome if I could do it easily at least by using individual components, a clear developer code example on how a model is added to the list of ipex.llm.optimize approved list, so that even for models that are not supported devs can use the approach.

@wangkl2
Copy link
Member

wangkl2 commented Apr 10, 2025

@unrahul It makes sense. Thanks for your feature request. I've escalated to the dev team and see if we could simplify it and provide a dev guide in the future release. @tye1

@unrahul
Copy link
Author

unrahul commented Apr 10, 2025

Thank you ! @wangkl2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature LLM XPU/GPU XPU/GPU specific issues
Projects
None yet
Development

No branches or pull requests

2 participants