Replies: 3 comments 2 replies
-
Are you asking, "Why doesn't this repository use training based on the game records obtained from self-play?" |
Beta Was this translation helpful? Give feedback.
-
Yes. I am confusing that why multiple agents self-play with reward is not enough. Card game models of imperfect information game i.e. DouZero is a complete self-play model, and it works well. |
Beta Was this translation helpful? Give feedback.
-
I see your point. It is very reasonable to have AIs of a certain strength fight against each other and then use those game records to advance further learning processes. This is a standard method in reinforcement learning. I plan to implement this traditional reinforcement learning method using self-play in the future. However, in recent years, the technique known as offline reinforcement learning, which doesn't require self-play and uses only existing game records, has been rapidly advancing. The advantage of offline reinforcement learning is that it doesn't require self-play, which tends to limit the learning speed in traditional reinforcement learning, therefore, learning can proceed at a much faster rate. One of the intermediate goals of this project is to determine how strong mahjong AIs can become through offline reinforcement learning. This is the reason why the current repository does not have a learning function through self-play. |
Beta Was this translation helpful? Give feedback.
-
As the readme of this repository indicates, plentiful game log is necessary in the preparation of training. Why this model does not randomly shuffle and generate the deck (ピーパイとワンパイJapanese), then generate 4 agents as players?
Beta Was this translation helpful? Give feedback.
All reactions