How to verify the string answer for MATH dataset? #21

yuxiaooye · 2025-01-17T08:51:40Z

Hi! Thanks for your awesome work!

I noticed that evaluate.py only has verify_float() for numerical answers. However, some ground truth in MATH dataset are strings like "p - q". Could you provide a verify function to handle these cases?

Thanks!

RewindL · 2025-01-21T07:31:21Z

I think the only way is to check whether p-q or $p$-$q$ exists in the output through re, or use a stronger LLM (might be slow and unstable). Hope there is a better way.

yuxiaooye · 2025-01-23T04:26:15Z

I think the only way is to check whether p-q or p -$q$ exists in the output through re, or use a stronger LLM (might be slow and unstable). Hope there is a better way.

@RewindL I found the evaluation script from original repo of MATH dataset: https://github.com/hendrycks/math/blob/357963a7f5501a6c1708cf3f3fb0cdf525642761/modeling/evaluate_gpt3.py#L106
which works well!

RewindL · 2025-01-23T06:36:04Z

I think the only way is to check whether p-q or p -$q$ exists in the output through re, or use a stronger LLM (might be slow and unstable). Hope there is a better way.

@RewindL I found the evaluation script from original repo of MATH dataset: https://github.com/hendrycks/math/blob/357963a7f5501a6c1708cf3f3fb0cdf525642761/modeling/evaluate_gpt3.py#L106 which works well!

Wow, it is terrific. Thanks for sharing. I am recently verifying results on math too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to verify the string answer for MATH dataset? #21

How to verify the string answer for MATH dataset? #21

yuxiaooye commented Jan 17, 2025

RewindL commented Jan 21, 2025

yuxiaooye commented Jan 23, 2025

RewindL commented Jan 23, 2025

How to verify the string answer for MATH dataset? #21

How to verify the string answer for MATH dataset? #21

Comments

yuxiaooye commented Jan 17, 2025

RewindL commented Jan 21, 2025

yuxiaooye commented Jan 23, 2025

RewindL commented Jan 23, 2025