Inference speed #294

jchavezar · 2023-09-21T17:59:26Z

jchavezar
Sep 21, 2023

Hi I've been using your snippets, and tried with the web version and I saw that you have streaming response enabled, that's cool bc it makes you think the response is immediately after the prompt... but if you try the generator directly by using generate_simple I get the response back like 30 seconds after which is what it takes to the streaming response... Is there any way to accelerate the response without streaming response?

I'm using 2xl4 / 24G VRAM GPUs, usage is around 60% each during prompting.

Thank you

turboderp · 2023-09-21T20:00:00Z

turboderp
Sep 21, 2023
Maintainer

There's an alternative generator you can use that lets you stream responses. Example for usage is here, at the bottom.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference speed #294

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Inference speed #294

jchavezar Sep 21, 2023

Replies: 1 comment

turboderp Sep 21, 2023 Maintainer

jchavezar
Sep 21, 2023

turboderp
Sep 21, 2023
Maintainer