w/ cot mode for "thinking" models #109

olive-jy-song · 2025-02-06T07:39:38Z

Thank you for the timely updates of the leaderboard, yet I had a couple of confusions regarding the w/ CoT column, and was hoping for some clarifications:

I noticed that you designed the w/ CoT mode so that a CoT is inferred first, followed by a second inference asking the model to answer based on its w/ CoT. Could you explain a bit more on the significance of this design?
How does the w/ CoT mode work for the "thinking" models, and how would that be different from the no CoT mode?

Thanks!

bys0318 · 2025-02-13T10:23:59Z

Hi, we follow the design of GPQA for the w/o CoT mode and the w/ CoT mode. In w/ CoT mode, we first ask the model to generate its chain-of-thought to derive the answer. Then for ease of extraction of the answer, it is followed by a second stage to let the model directly output the answer based on the chain-of-thought.
For reasoning models such as o1 and R1, the w/ CoT setting is not necessary, as these models automatically output their thinking process whether prompted or not. Nevertheless, we retain this evaluation setting to ensure consistency in results and facilitate comparison.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

w/ cot mode for "thinking" models #109

w/ cot mode for "thinking" models #109

olive-jy-song commented Feb 6, 2025

bys0318 commented Feb 13, 2025

w/ cot mode for "thinking" models #109

w/ cot mode for "thinking" models #109

Comments

olive-jy-song commented Feb 6, 2025

bys0318 commented Feb 13, 2025