You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In MCTS, a "rollout" involves simulating a sequence of actions from a given node to a terminal state to estimate the potential outcome or reward of that path. However, once a terminal node is reached, it represents the end of a reasoning path, and no further rollouts are performed from that point. Instead, the terminal node's confidence score is derived from the rollouts conducted during the generation of that trajectory and majority voting at the terminal nodes. This confidence score reflects the model's assessment of the correctness or quality of the reasoning path leading to that terminal state.
Why does the terminal node still have rollouts?
The text was updated successfully, but these errors were encountered: