You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to develop method to curate dataset similar to LIMR and it would be helpful if you could release the data for calculating the LIMR score and potentially the model on full dataset so that I do not need to rerun the RL. If I understand the code correctly, it should be the ./data/output/math.8k.json file.
The text was updated successfully, but these errors were encountered:
If I understand it correctly, the scores json is the alignment socre. Instead I am interested in the rewards per epoch of all the samples which used to calulate the alinment score, the rewards r_i^k as suggested in section 2.2.1. Sorry for the confusion and please let me know if it already in the repo or would you willing to release it.
Hi, thanks for the nice works!
I am trying to develop method to curate dataset similar to LIMR and it would be helpful if you could release the data for calculating the LIMR score and potentially the model on full dataset so that I do not need to rerun the RL. If I understand the code correctly, it should be the ./data/output/math.8k.json file.
The text was updated successfully, but these errors were encountered: