You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the timely updates of the leaderboard, yet I had a couple of confusions regarding the w/ CoT column, and was hoping for some clarifications:
I noticed that you designed the w/ CoT mode so that a CoT is inferred first, followed by a second inference asking the model to answer based on its w/ CoT. Could you explain a bit more on the significance of this design?
How does the w/ CoT mode work for the "thinking" models, and how would that be different from the no CoT mode?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi, we follow the design of GPQA for the w/o CoT mode and the w/ CoT mode. In w/ CoT mode, we first ask the model to generate its chain-of-thought to derive the answer. Then for ease of extraction of the answer, it is followed by a second stage to let the model directly output the answer based on the chain-of-thought.
For reasoning models such as o1 and R1, the w/ CoT setting is not necessary, as these models automatically output their thinking process whether prompted or not. Nevertheless, we retain this evaluation setting to ensure consistency in results and facilitate comparison.
Thank you for the timely updates of the leaderboard, yet I had a couple of confusions regarding the w/ CoT column, and was hoping for some clarifications:
I noticed that you designed the w/ CoT mode so that a CoT is inferred first, followed by a second inference asking the model to answer based on its w/ CoT. Could you explain a bit more on the significance of this design?
How does the w/ CoT mode work for the "thinking" models, and how would that be different from the no CoT mode?
Thanks!
The text was updated successfully, but these errors were encountered: