This repository is a branch of Mortal original repository ,transitioning from value-based methods to policy-based methods.
Initially developed in 2022 based on Mortal V2, migrated to Mortal V4 in 2024.
This branch features:
- More stable performance optimization process
- Enhanced final performance
Note:
The performance results are based on a comparison with the baseline model. The baseline used for testing has been uploaded to RiichLab(mjai.app) and has maintained the stable rank across multiple evaluation batches.
Consistent with the original repository. Read the Documentation
Requirement: PyTorch>=2.4.0
Tested With: PyTorch2.5.1+CUDA 12.4 (install via pip)
Mortal-Policy adopts an offline to online training approach:
-
Data Preparation
Collect samples inmjai
format. -
Configuration
Renameconfig.example.toml
toconfig.toml
and set hyperparameters. -
Training Stages
-
Offline Phase1 (Advantage Weighted Regression):
Runtrain_offline_phase1.py
-
Offline Phase2 (Behavior Proximal Policy Optimization):
It is optional and only suitable when online is unavailable, and the code is coming soon
-
Online Phase (Policy Gradient with Importance Sampling and PPO-style Clipping):
Runtrain_online.py
While online-only training is possible, it is not recommended.
Advantage Weighted Regression(AWR) is not included in the original implementation based on Mortal V2. You can try the following alternative options: Behavior Cloning(BC) or distillation from the value-based Mortal. -
Maintained alignment with original Mortal repository. For details see this post.
The weights, hyperparameters, and some online training features have been removed from this branch when it was open-sourced.
Copyright (C) 2021-2022 Equim
Copyright (C) 2025 Nitasurin
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.