Speed on A100 #266

Ber666 · 2023-08-30T07:07:23Z

Hi, thanks for the cool project.
I am testing Llama-2-70B-GPTQ with 1 * A100 40G, the speed is around 9 t/s

Is this the expected speed? I noticed in some other issues that the code is only optimized for consumer GPUs, but I just wanted to double check if that's the expected speed or I made mistakes somewhere

turboderp · 2023-08-30T08:10:22Z

I haven't tested 70B on A100 before, but the speed is close to what I've seen for 65B on A100, so I think this is about expected, yes.

jday96314 · 2023-09-01T03:26:23Z

To give you another data point, with 70B I get 10 - 13 t/s per A100 80 GB (SXM4).

akaikite · 2023-09-11T02:25:23Z

I can't believe that the a100 gets the same speed as the 3090. Maybe something can be improved here?

turboderp · 2023-09-11T11:15:19Z

There's definitely some room for improvement, but you're not going to see anything on the order of the difference in cost between the A100 and the 3090. When you're memory-bound, as you end up being here, what matters is that the A100 40G only has about 50-60% more global memory bandwidth than the 3090. So if the implementation is properly optimized and tuned for that architecture (ExLlama isn't, to be clear) then you're looking at 50-60% more tokens per second.

Now, if you're serving large batches, inference becomes compute-bound instead, and the A100 will outperform the 3090 very easily. But to serve large batches you also need a bunch more VRAM dedicated to state and cache. 40 GB won't get you very far, and even 80 GB is questionable. What use-case are you optimizing for, then? One quantized 70B model serving no more than 8 concurrent users, or something? A small business willing to invest in one A100 but not two, or three? Or if you're also trying to accommodate multi-A100 setups with tensor parallelism and whatnot, at what point does quantization stop making sense?

But yes, V2 is coming, and it's faster all around, including on the A100. So there's that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Speed on A100 #266

Speed on A100 #266

Ber666 commented Aug 30, 2023

turboderp commented Aug 30, 2023

Uh oh!

jday96314 commented Sep 1, 2023

Uh oh!

akaikite commented Sep 11, 2023

Uh oh!

turboderp commented Sep 11, 2023

Uh oh!

Uh oh!

Speed on A100 #266

Speed on A100 #266

Comments

Ber666 commented Aug 30, 2023

turboderp commented Aug 30, 2023

Uh oh!

jday96314 commented Sep 1, 2023

Uh oh!

akaikite commented Sep 11, 2023

Uh oh!

turboderp commented Sep 11, 2023

Uh oh!