-
Notifications
You must be signed in to change notification settings - Fork 271
Feature Request: Allow "Best-Effort" Optimization for Custom Models via ipex.llm.optimize
on XPUs
#807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@unrahul Thanks for your suggestions about the flexible support of
|
hi @wangkl2 , thank you, for example moondream is an interesting model, that I have seen many startups using, and has a lot of benefit in getting optimized using ipex.llm.optimize. I would still highly request, at least a developer note on how someone can optimize non supported models, as you know there are 100s of models and it can really help developers building custom inference solution to have an example of This can avoid non standard wrapper code for optimizations, which for example I am doing now. For example here is how I am optimizing moondream : def _apply_component_specific_optimizations(self, model, ipex_config):
"""
Apply component-specific optimizations to the model.
Falls back gracefully if components are not found or errors occur.
Args:
model: The loaded model
ipex_config: IPEX configuration dictionary
Returns:
The model with component-specific optimizations applied where possible
"""
try:
logger.info(f"Model Name: {type(model).__name__}")
if hasattr(model, "model") and isinstance(model.model, torch.nn.Module):
logger.info("Found nested model structure (HfMoondream.model)")
target_model = model.model
else:
logger.info("Using direct model structure")
target_model = model
if hasattr(target_model, "text") and isinstance(
target_model.text, torch.nn.Module
):
logger.info(
"Optimizing language model component with ipex.llm.optimize..."
)
text_model = target_model.text
original_config = None
if hasattr(text_model, "config"):
if hasattr(text_model.config, "model_type"):
original_config = text_model.config.model_type
text_model.config.model_type = "phi"
else:
text_model.config.model_type = "phi"
if (
not hasattr(text_model.config, "architectures")
or not isinstance(text_model.config.architectures, list)
or len(text_model.config.architectures) == 0
):
text_model.config.architectures = ["PhiForCausalLM"]
if not hasattr(text_model.config, "hidden_size"):
for name, module in text_model.named_modules():
if isinstance(module, torch.nn.Linear):
text_model.config.hidden_size = module.in_features
break
else:
text_model.config = PretrainedConfig()
text_model.config.model_type = "phi"
text_model.config.architectures = ["PhiForCausalLM"]
added_model_wrapper = False
if hasattr(text_model, "blocks") and isinstance(
text_model.blocks, torch.nn.ModuleList
):
class ModelWrapper(torch.nn.Module):
def __init__(self, blocks):
super().__init__()
self.layers = blocks
text_model.model = ModelWrapper(text_model.blocks)
added_model_wrapper = True
try:
target_model.text = ipex.llm.optimize(
target_model.text,
dtype=self.dtype,
device=self.device.type,
inplace=True,
)
logger.info(
"Successfully optimized llm component with llm.optimize"
)
if original_config is not None:
target_model.text.config.model_type = original_config
if added_model_wrapper and hasattr(target_model.text, "model"):
delattr(target_model.text, "model")
except Exception as e:
logger.warning(
f"Failed to optimize llm component with ipex.llm.optimize: {str(e)}"
)
if original_config is not None and hasattr(
target_model.text.config, "model_type"
):
target_model.text.config.model_type = original_config
if added_model_wrapper and hasattr(target_model.text, "model"):
delattr(target_model.text, "model")
logger.info(
"Falling back to standard optimization for this component"
)
try:
target_model.text = ipex.optimize(
target_model.text, dtype=self.dtype, **ipex_config
)
logger.info(
"Successfully optimized with standard ipex.optimize"
)
except Exception as e2:
logger.warning(
f"Failed to optimize llm component with standard ipex.optimize: {str(e2)}"
)
else:
logger.warning("llm component not found. Skipping LLM optimization.")
if hasattr(target_model, "vision") and isinstance(
target_model.vision, torch.nn.Module
):
try:
target_model.vision = ipex.optimize(
target_model.vision, dtype=self.dtype, **ipex_config
)
logger.info("Successfully optimized vision component")
except Exception as e:
logger.warning(f"Failed to optimize vision component: {str(e)}")
else:
logger.warning(
"Vision component not found. Skipping vision optimization."
)
if hasattr(target_model, "region") and isinstance(
target_model.region, torch.nn.ModuleDict
):
try:
for key, component in target_model.region.items():
if isinstance(component, torch.nn.Module) and not isinstance(
component, torch.nn.Parameter
):
try:
target_model.region[key] = ipex.optimize(
component, dtype=self.dtype, **ipex_config
)
except Exception as e:
logger.warning(
f"Failed to optimize region component {key}: {str(e)}"
)
elif isinstance(component, torch.nn.ModuleDict):
for sub_key, sub_component in component.items():
try:
target_model.region[key][sub_key] = ipex.optimize(
sub_component, dtype=self.dtype, **ipex_config
)
except Exception as e:
logger.warning(
f"Failed to optimize region sub_component {key}.{sub_key}: {str(e)}"
)
logger.info("Successfully optimized region model components")
except Exception as e:
logger.warning(
f"Error during region component optimization: {str(e)}"
)
else:
logger.warning(
"Region components not found. Skipping region optimization."
)
except Exception as e:
logger.warning(f"Error during component-specific optimization: {str(e)}")
try:
model = ipex.optimize(model, dtype=self.dtype, **ipex_config)
logger.info("Successfully applied general IPEX optimization")
except Exception as e:
logger.warning(f"Failed to apply general IPEX optimization: {str(e)}")
return model It would be awesome if I could do it easily at least by using individual components, a clear developer code example on how a model is added to the list of ipex.llm.optimize approved list, so that even for models that are not supported devs can use the approach. |
Thank you ! @wangkl2 |
Describe the issue
Feature Request: Allow "Best-Effort" Optimization for Custom Models via
ipex.llm.optimize
Motivation:
The
ipex.llm.optimize
API is a powerful API for accelerated inference on supported LLM families on xpus . However, its current design seems tightly coupled to these specific, verified architectures andipex.optimize
is often limited.Problem Description:
When working with custom decoder models face challenges when trying to leverage
ipex.llm.optimize
.These models include:
Currently, applying
ipex.llm.optimize
to such models often requires non-trivial workarounds, such as modifying the model'sconfig.json
or using monkey-patching techniques to make the model appear as one of the supported types. This process is indirect, adds development overhead, and isn't guaranteed to apply optimizations correctly.Proposed Solution:
Introduce a pathway for
ipex.llm.optimize
to apply optimizations on a "best-effort" basis to models not explicitly listed as supported. This could involve:attempt_optimization_on_unsupported=True
could allow users to explicitly request optimization, acknowledging it might not be fully tuned or guaranteed.torch.nn.Module
and apply optimizations known to be generally applicable to transformer decoder blocks (e.g., optimizing linear layers, specific activation functions, KV caching if patterns are detected) without relying on exact model family identification.Benefits:
Conclusion:
Providing a mechanism, even if experimental or "best-effort," to apply
ipex.llm.optimize
to a wider range of decoder-like models would be a valuable addition for the community building and deploying custom AI solutions on Intel hardware.The text was updated successfully, but these errors were encountered: