Activity
Remove exllama backend, pending further fixes
Remove exllama backend, pending further fixes
Only import big python modules for GPTQ once they get used
Only import big python modules for GPTQ once they get used
Only import big python modules for GPTQ once they get used
Only import big python modules for GPTQ once they get used
Automatically install exllama module
Automatically install exllama module
Merge remote-tracking branch 'upstream/united' into 4bit-plugin
Merge remote-tracking branch 'upstream/united' into 4bit-plugin
Fallback to transformers if hf_bleeding_edge not available
Fallback to transformers if hf_bleeding_edge not available
Load GPTQ module from GPTQ repo docs
Load GPTQ module from GPTQ repo docs
Merge upstream changes, fix conflict
Merge upstream changes, fix conflict
Merge upstream changes, fix conflict, adapt backends to changes
Merge upstream changes, fix conflict, adapt backends to changes
Fix non-tuple return from gptq function
Fix non-tuple return from gptq function
Add exllama superhot positional embeddings compression support
Add exllama superhot positional embeddings compression support
Remove rocm gptq install from environments file
Remove rocm gptq install from environments file
Remove rocm gptq install from environments file
Remove rocm gptq install from environments file
Disable scaled_dot_product_attention if torch version < 2
Disable scaled_dot_product_attention if torch version < 2
Track token generation progress
Track token generation progress
Fix AMD ROCm exllama inference
Fix AMD ROCm exllama inference
Add v2 with bias support (e.g. for Tulu-30b)
Add v2 with bias support (e.g. for Tulu-30b)