-
Notifications
You must be signed in to change notification settings - Fork 93
Allow model_max_tokens
to be set to whatever the LLM maximum is
#1233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Uhm silly me I could have just calculated the extra token count is |
Which also makes me realize that passing the entirety of the codebase to the LLM (which I think is what |
@iuliaturc I would recommed using the Gemini models for this patchflow. But we'll take a look at the tokens too! The template patchflow was meant to be a naive implementation - we do have some avenues to compress the code2prompt output. |
Nice, do you have any docs or official recommendations on how to compress the code2prompt output? |
Description
I'm trying to use
GenerateREADME
and maximize the underlying LLM's context window. But unfortunately I can't figure out easily what that magical value is, becausemodel_max_tokens
isn't the length of the final input sent to the LLM.For instance, I'm trying to consume the entire 128k context window. And I'm doing a bunch of trials:
patchwork GenerateREADME ... model_max_tokens=128_000
===>Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 255511 tokens
patchwork GenerateREADME ... model_max_tokens=64_000
===>Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 191511 tokens
patchwork GenerateREADME ... model_max_tokens=30_000
===>Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 157511 tokens
So I need to keep guessing.
Proposed solution
Have an option to e.g. set
model_max_tokens=-1
, which would mean the maximum window allowed by the underlying LLM, once all the other tokens you're sending under the hood are accounted for.Alternatives considered
n/a
The text was updated successfully, but these errors were encountered: