Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

w/ cot mode for "thinking" models #109

Open
olive-jy-song opened this issue Feb 6, 2025 · 1 comment
Open

w/ cot mode for "thinking" models #109

olive-jy-song opened this issue Feb 6, 2025 · 1 comment

Comments

@olive-jy-song
Copy link

Thank you for the timely updates of the leaderboard, yet I had a couple of confusions regarding the w/ CoT column, and was hoping for some clarifications:

  1. I noticed that you designed the w/ CoT mode so that a CoT is inferred first, followed by a second inference asking the model to answer based on its w/ CoT. Could you explain a bit more on the significance of this design?

  2. How does the w/ CoT mode work for the "thinking" models, and how would that be different from the no CoT mode?

Thanks!

@bys0318
Copy link
Member

bys0318 commented Feb 13, 2025

Hi, we follow the design of GPQA for the w/o CoT mode and the w/ CoT mode. In w/ CoT mode, we first ask the model to generate its chain-of-thought to derive the answer. Then for ease of extraction of the answer, it is followed by a second stage to let the model directly output the answer based on the chain-of-thought.
For reasoning models such as o1 and R1, the w/ CoT setting is not necessary, as these models automatically output their thinking process whether prompted or not. Nevertheless, we retain this evaluation setting to ensure consistency in results and facilitate comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants