Skip to content

Conversation

abukhoy
Copy link
Contributor

@abukhoy abukhoy commented Aug 4, 2025

This pull request introduces support for compile-time options via keyword arguments (kwargs), including the aic-hw-version parameter, which now accepts values "ai100" or "ai200". If no value is provided, the default is "ai100", representing the AI100 hardware.

These enhancements allow users to tailor the compile API to better suit their specific requirements.

Example Usage:

from QEfficient import QEFFAutoModelForCausalLM
from transformers import AutoTokenizer

model_name = "gpt2"
model = QEFFAutoModelForCausalLM.from_pretrained(model_name, num_hidden_layers=2)

model.compile(prefill_seq_len=128, ctx_len=256, num_cores=16, num_devices=1, **{'aic-hw-version': 'ai100'})

tokenizer = AutoTokenizer.from_pretrained(model_name)
model.generate(prompts=["Hi there!!"], tokenizer=tokenizer)

Note: Previously, the default value for aic-hw-version was "2.0", which implicitly referred to AI100. This value is now deprecated and replaced with the explicit "ai100" identifier.

@abukhoy
Copy link
Contributor Author

abukhoy commented Aug 6, 2025

I have made a little change to the _compile function of the base class by including some helper method. If it's not okay then I will revert it.

@quic-hemagnih
Copy link
Contributor

Is anything pending on this? I think we are good to merge this change.

@quic-rishinr
Copy link
Contributor

Is anything pending on this? I think we are good to merge this change.

Yes, the compiler changes need to be merged first before we proceed with adding this change to Qeff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants