Question about training #31

FYNIXqwq · 2023-05-22T09:45:51Z

FYNIXqwq
May 22, 2023

As the readme of this repository indicates, plentiful game log is necessary in the preparation of training. Why this model does not randomly shuffle and generate the deck (ピーパイとワンパイJapanese), then generate 4 agents as players?

Cryolite · 2023-05-22T23:10:55Z

Cryolite
May 22, 2023
Maintainer

Are you asking, "Why doesn't this repository use training based on the game records obtained from self-play?"

0 replies

FYNIXqwq · 2023-05-23T10:40:22Z

FYNIXqwq
May 23, 2023
Author

Yes. I am confusing that why multiple agents self-play with reward is not enough. Card game models of imperfect information game i.e. DouZero is a complete self-play model, and it works well.

0 replies

Cryolite · 2023-05-23T13:23:58Z

Cryolite
May 23, 2023
Maintainer

I see your point. It is very reasonable to have AIs of a certain strength fight against each other and then use those game records to advance further learning processes. This is a standard method in reinforcement learning. I plan to implement this traditional reinforcement learning method using self-play in the future.

However, in recent years, the technique known as offline reinforcement learning, which doesn't require self-play and uses only existing game records, has been rapidly advancing. The advantage of offline reinforcement learning is that it doesn't require self-play, which tends to limit the learning speed in traditional reinforcement learning, therefore, learning can proceed at a much faster rate. One of the intermediate goals of this project is to determine how strong mahjong AIs can become through offline reinforcement learning.

This is the reason why the current repository does not have a learning function through self-play.

2 replies

FYNIXqwq May 23, 2023
Author

Thank you for giving this detailed explanation for me and for this repository. However, will Offline RL becoming over-fitting the human action in the "cited example" more exactly, crawled logs (牌譜 in Mahjong Hanchans)?

Cryolite May 29, 2023
Maintainer

Classical over-fitting is a problem in supervised learning (behavioral cloning) and is unlikely to be an issue in offline reinforcement learning. However, unique problems called distributional shift and unlearning effect may arise in offline reinforcement learning (cf. https://arxiv.org/abs/2005.01643). These issues stem from overestimating Q-values for actions that do not appear in training examples. The IQL implemented in this repository is a method to mitigate these problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about training #31

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Question about training #31

FYNIXqwq May 22, 2023

Replies: 3 comments · 2 replies

Cryolite May 22, 2023 Maintainer

FYNIXqwq May 23, 2023 Author

Cryolite May 23, 2023 Maintainer

FYNIXqwq May 23, 2023 Author

Cryolite May 29, 2023 Maintainer

FYNIXqwq
May 22, 2023

Replies: 3 comments 2 replies

Cryolite
May 22, 2023
Maintainer

FYNIXqwq
May 23, 2023
Author

Cryolite
May 23, 2023
Maintainer

FYNIXqwq May 23, 2023
Author

Cryolite May 29, 2023
Maintainer