Feature Request: Allow "Best-Effort" Optimization for Custom Models via `ipex.llm.optimize` on XPUs

### Describe the issue

## Feature Request: Allow "Best-Effort" Optimization for Custom Models via `ipex.llm.optimize`

**Motivation:**
The `ipex.llm.optimize` API is a powerful API for accelerated inference on supported LLM families on xpus . However, its current design seems tightly coupled to these specific, verified architectures and `ipex.optimize` is often limited.

**Problem Description:**
When working with custom decoder models face challenges when trying to leverage `ipex.llm.optimize`. 
These models include:
* Smaller, domain-specific decoders tailored for particular tasks.
* Decoder components within larger Vision Language Models (VLMs).
* Novel architectures developed during research.

Currently, applying `ipex.llm.optimize` to such models often requires non-trivial workarounds, such as modifying the model's `config.json` or using monkey-patching techniques to make the model appear as one of the supported types. This process is indirect, adds development overhead, and isn't guaranteed to apply optimizations correctly.

**Proposed Solution:**
Introduce a pathway for `ipex.llm.optimize` to apply optimizations on a "best-effort" basis to models not explicitly listed as supported. This could involve:

1.  **An Opt-in Mechanism:** A boolean flag like `attempt_optimization_on_unsupported=True` could allow users to explicitly request optimization, acknowledging it might not be fully tuned or guaranteed.
2.  **Heuristic-Based Optimization:** The optimizer could inspect the provided `torch.nn.Module` and apply optimizations known to be generally applicable to transformer decoder blocks (e.g., optimizing linear layers, specific activation functions, KV caching if patterns are detected) without relying on exact model family identification.
3.  **User Hints (Optional):** Potentially allow users to provide basic hints about the model structure if needed (though a fully automatic approach is preferred).

**Benefits:**
* **Reduced Friction:** Lowers the barrier for developers to experiment with IPEX optimizations on custom models.
* **Faster Iteration:** Enables quicker testing and deployment of optimized custom architectures.
* **Broader Applicability:** Extends the reach and utility of IPEX optimizations beyond the core supported model list.
* **Flexibility:** Allows optimizing components (like VLM decoders) independently.

**Conclusion:**
Providing a mechanism, even if experimental or "best-effort," to apply `ipex.llm.optimize` to a wider range of decoder-like models would be a valuable addition for the community building and deploying custom AI solutions on Intel hardware.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Allow "Best-Effort" Optimization for Custom Models via `ipex.llm.optimize` on XPUs #807

Describe the issue

Feature Request: Allow "Best-Effort" Optimization for Custom Models via `ipex.llm.optimize`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Allow "Best-Effort" Optimization for Custom Models via ipex.llm.optimize on XPUs #807

Description

Describe the issue

Feature Request: Allow "Best-Effort" Optimization for Custom Models via ipex.llm.optimize

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Feature Request: Allow "Best-Effort" Optimization for Custom Models via `ipex.llm.optimize` on XPUs #807

Feature Request: Allow "Best-Effort" Optimization for Custom Models via `ipex.llm.optimize`