Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEMINI_TOOL breaks chain-of-thought reasoning #1285

Open
4 of 8 tasks
dylanjcastillo opened this issue Dec 27, 2024 · 0 comments · May be fixed by #1286
Open
4 of 8 tasks

GEMINI_TOOL breaks chain-of-thought reasoning #1285

dylanjcastillo opened this issue Dec 27, 2024 · 0 comments · May be fixed by #1286
Labels
bug Something isn't working

Comments

@dylanjcastillo
Copy link
Contributor

dylanjcastillo commented Dec 27, 2024

  • This is actually a bug report.
  • I am not getting good LLM Results
  • I have tried asking for help in the community on discord or discussions and have not received a response.
  • I have tried searching the documentation and have not found an answer.

What Model are you using?

  • gpt-3.5-turbo
  • gpt-4-turbo
  • gpt-4
  • Other (please specify) Gemini

Describe the bug
Gemini function calling doesn't preserve the order of the keys of the schema provided, which significantly reduces the performance of tasks that depend on chain-of-thought reasoning.

For example, for a sample of 200 questions from GSM8K, you get this performance difference:

GEMINI_TOOL - Mean: 39.00% CI: 32.22% - 45.78%
GEMINI_JSON - Mean: 94.50% CI: 91.33% - 97.67%

You can verify this by extracting the text generated in _raw_response.

To Reproduce
Here's a notebook you can use to reproduce the result: https://github.com/dylanjcastillo/blog/blob/main/_extras/gemini-structured-outputs/gemini-structured-outputs-benchmarks-instructor.ipynb

I wrote a more detailed analysis here: http://dylancastillo.co/posts/gemini-structured-outputs.html

Expected behavior
Given that this is fairly typical use case, I'd suggest making GEMINI_JSON the default approach and warn users about this issue.

Screenshots
If applicable, add screenshots to help explain your problem.

@github-actions github-actions bot added the bug Something isn't working label Dec 27, 2024
@dylanjcastillo dylanjcastillo linked a pull request Dec 27, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant