Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress on the rewrite for older cards (Like the P40) #279

Open
TimyIsCool opened this issue Sep 8, 2023 · 1 comment
Open

Progress on the rewrite for older cards (Like the P40) #279

TimyIsCool opened this issue Sep 8, 2023 · 1 comment

Comments

@TimyIsCool
Copy link

Was wondering what the current progress was on the rewrite and if this could be turned into some sort of tracker for it? optimizations for the P40 seems to be something many would like

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Sep 10, 2023

I think V2 is in the works. Not sure if it will have support for P40 but then again, you have llama.cpp that is all FP32 and I can run Q5KM and Q6 quants on it. If you apply the peer access patch it even does direct transfers on linux. For nvlink it's faster than exllama. Some downsides in how it processes prompts and mem efficiency but other than that, you can use it today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants