Open
Description
I'm sorry I am unable to find relevant doc on Internet on how to load all modules on GPU.
I got this error message from my code:
Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True in the quantization config object
A snippet from my code (to make it work, I had to uncomment the config
part, but it won't be using Exllama)
MODEL_ID = "TheBloke/Llama-2-13b-Chat-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
# config = AutoConfig.from_pretrained(MODEL_ID)
# config.quantization_config["disable_exllama"] = True
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
# config=config,
)
Any help is greatly appreciated!
Metadata
Metadata
Assignees
Labels
No labels