How to decrease time to generate first token? #297

VenkatLohithDasari · 2023-09-26T09:52:23Z

VenkatLohithDasari
Sep 26, 2023

I have copied code from example_ws.py to enable text streaming. It's good and all but there is one big problem, It's takes a lot of time to generate the first token, The rest of the tokens are generated pretty fast around 15t/s. Is there any way to fix this problem?

My Chatbot works like this It takes the user message, detects the intent of the message then creates an appropriate prompt for that intent using f-strings...So it means the prompt always changes depending on context. Just saying this if this info is useful for why first token generation is slow!

I want to know what does generation of the first token depends upon. If there is some sort of conversion of the prompt into math equations. Maybe we can cache it by storing it into a variable? and let only the newly appended message to prompt be converted? Is that possible? Sorry If I sound dumb, I'm not AI programmer...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to decrease time to generate first token? #297

{{title}}

Replies: 0 comments

Select a reply

How to decrease time to generate first token? #297

VenkatLohithDasari Sep 26, 2023

Replies: 0 comments

VenkatLohithDasari
Sep 26, 2023