You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking for ways to increase the inference speed, and one thing I thought would be useful was to use FP16. For this, I called model.half() after loading it. Unfortunately, it generated RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' error. I was wondering if there is a way to use FP16 during inference? (Or any other trick to accelerate inference).
# This works:
model = MvpForConditionalGeneration.from_pretrained('RUCAIBox/mvp')
inputs = tokenizer(
["Describe the following data: Iron Man | instance of | Superhero [SEP] Stan Lee | creator | Iron Man",
"Describe the following data: Batman | instance of | Superhero",
]
return_tensors="pt",
)
generated_ids = model.generate(**inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
['Iron Man is a fictional superhero appearing in American comic books published by Marvel Comics.',
"Batman is a superhero"]
# This doesn't:
model = model.half()
generated_ids = model.generate(**inputs)
The text was updated successfully, but these errors were encountered:
Sorry, I am not familar with this. Our model is based on the Hugging Face API. You can find solution in their GitHub or Forum. Or the Accelerate can work?
Hello,
I was looking for ways to increase the inference speed, and one thing I thought would be useful was to use FP16. For this, I called
model.half()
after loading it. Unfortunately, it generatedRuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
error. I was wondering if there is a way to use FP16 during inference? (Or any other trick to accelerate inference).The text was updated successfully, but these errors were encountered: