Replies: 1 comment
-
There's an alternative generator you can use that lets you stream responses. Example for usage is here, at the bottom. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi I've been using your snippets, and tried with the web version and I saw that you have streaming response enabled, that's cool bc it makes you think the response is immediately after the prompt... but if you try the generator directly by using generate_simple I get the response back like 30 seconds after which is what it takes to the streaming response... Is there any way to accelerate the response without streaming response?
I'm using 2xl4 / 24G VRAM GPUs, usage is around 60% each during prompting.
Thank you
Beta Was this translation helpful? Give feedback.
All reactions