Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

探讨一下LIM的公式逻辑及相关疑问 #14

Open
qinb opened this issue Mar 21, 2025 · 1 comment
Open

探讨一下LIM的公式逻辑及相关疑问 #14

qinb opened this issue Mar 21, 2025 · 1 comment

Comments

@qinb
Copy link

qinb commented Mar 21, 2025

根据alignment score的公式,难例实际上LIMR是不学习的,感觉挺反常识的,作者能解释一下吗?
还有一个问题,论文中图2a和图2b的结论是不是有些矛盾?图2a说明复杂的learning dynamics对模型提升有帮助,图2b显示score=0.83的样本,表现是非常平稳的。 所以,按照图2a的逻辑平稳的应该对模型没有作用,不加入训练,但是图2b又是平稳的【分数高】需要加入训练,这个怎么理解呢?而且图上的纵坐标Reward为什么是20,40,60...,reward不是只有三个值吗1,-0.5, -1 @hongtangshui

@hongtangshui
Copy link
Collaborator

1 并非难题不学习,而是学习之后还很难做对的难题不学习,如果题目一开始对模型困难,但是模型也能在学习中做对对模型的帮助可能更大。
2 图2a并没有隐含"复杂的learning dynamics对模型提升有帮助"的信息,对模型提升有帮助还是需要实验结论说话。0.83那个score实际上在训练初期还是有很大上升的,后面逐渐平稳了。
3 reward这里确实写的不够清晰。我们这里只关心正确性reward,并且标准化到0-100了。也就是正确response,reward=100,错误response,reward=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants