GEMINI_TOOL breaks chain-of-thought reasoning #1285

dylanjcastillo · 2024-12-27T19:15:54Z

This is actually a bug report.
I am not getting good LLM Results
I have tried asking for help in the community on discord or discussions and have not received a response.
I have tried searching the documentation and have not found an answer.

What Model are you using?

gpt-3.5-turbo
gpt-4-turbo
gpt-4
Other (please specify) Gemini

Describe the bug
Gemini function calling doesn't preserve the order of the keys of the schema provided, which significantly reduces the performance of tasks that depend on chain-of-thought reasoning.

For example, for a sample of 200 questions from GSM8K, you get this performance difference:

GEMINI_TOOL - Mean: 39.00% CI: 32.22% - 45.78%
GEMINI_JSON - Mean: 94.50% CI: 91.33% - 97.67%

You can verify this by extracting the text generated in _raw_response.

To Reproduce
Here's a notebook you can use to reproduce the result: https://github.com/dylanjcastillo/blog/blob/main/_extras/gemini-structured-outputs/gemini-structured-outputs-benchmarks-instructor.ipynb

I wrote a more detailed analysis here: http://dylancastillo.co/posts/gemini-structured-outputs.html

Expected behavior
Given that this is fairly typical use case, I'd suggest making GEMINI_JSON the default approach and warn users about this issue.

Screenshots
If applicable, add screenshots to help explain your problem.

The text was updated successfully, but these errors were encountered:

github-actions bot added the bug Something isn't working label Dec 27, 2024

dylanjcastillo linked a pull request Dec 27, 2024 that will close this issue

Use GEMINI_JSON as default #1286

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GEMINI_TOOL breaks chain-of-thought reasoning #1285

GEMINI_TOOL breaks chain-of-thought reasoning #1285

dylanjcastillo commented Dec 27, 2024 •

edited

Loading

GEMINI_TOOL breaks chain-of-thought reasoning #1285

GEMINI_TOOL breaks chain-of-thought reasoning #1285

Comments

dylanjcastillo commented Dec 27, 2024 • edited Loading

dylanjcastillo commented Dec 27, 2024 •

edited

Loading