You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Was wondering what the current progress was on the rewrite and if this could be turned into some sort of tracker for it? optimizations for the P40 seems to be something many would like
The text was updated successfully, but these errors were encountered:
I think V2 is in the works. Not sure if it will have support for P40 but then again, you have llama.cpp that is all FP32 and I can run Q5KM and Q6 quants on it. If you apply the peer access patch it even does direct transfers on linux. For nvlink it's faster than exllama. Some downsides in how it processes prompts and mem efficiency but other than that, you can use it today.
Was wondering what the current progress was on the rewrite and if this could be turned into some sort of tracker for it? optimizations for the P40 seems to be something many would like
The text was updated successfully, but these errors were encountered: