Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does "the terminal node’s confidence score achieved from rollouts" mean in paper? #7

Open
tongyx361 opened this issue Sep 23, 2024 · 1 comment

Comments

@tongyx361
Copy link

We compute each trajectory’s final score by multiplying its reward with the terminal node’s confidence score achieved from rollouts.

Why does the terminal node still have rollouts?

@Thunderbeee
Copy link
Collaborator

In MCTS, a "rollout" involves simulating a sequence of actions from a given node to a terminal state to estimate the potential outcome or reward of that path. However, once a terminal node is reached, it represents the end of a reasoning path, and no further rollouts are performed from that point. Instead, the terminal node's confidence score is derived from the rollouts conducted during the generation of that trajectory and majority voting at the terminal nodes. This confidence score reflects the model's assessment of the correctness or quality of the reasoning path leading to that terminal state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants