Orchestrated Value Mapping

This repository hosts the code release for the paper "Orchestrated Value Mapping for Reinforcement Learning", published at ICLR 2022. This work was done by Mehdi Fatemi (Microsoft Research) and Arash Tavakoli (Max Planck Institute for Intelligent Systems).

We release a flexible framework, built upon Dopamine (Castro et al., 2018), for building and orchestrating various mappings over different reward decomposition schemes. This enables the research community to easily explore the design space that our theory opens up and investigate new convergent families of algorithms.

The code has been developed by Arash Tavakoli.

LICENSE

Microsoft Open Source Code of Conduct

Citing

If you make use of our work, please use the citation information below:

@inproceedings{Fatemi2022Orchestrated,
  title={Orchestrated Value Mapping for Reinforcement Learning},
  author={Mehdi Fatemi and Arash Tavakoli},
  booktitle={International Conference on Learning Representations},
  year={2022},
  url={https://openreview.net/forum?id=c87d0TS4yX}
}

Getting started

We install the required packages within a virtual environment.

Virtual environment

Create a virtual environment using conda via:

conda create --name maprl-env python=3.8
conda activate maprl-env

Prerequisites

Atari benchmark. To set up the Atari suite, please follow the steps outlined here.

Install Dopamine. Install a compatible version of Dopamine with pip:

pip install dopamine-rl==3.1.10

Installing from source

To easily experiment within our framework, install it from source and modify the code directly:

git clone https://github.com/microsoft/orchestrated-value-mapping.git
cd orchestrated-value-mapping
pip install -e .

Training an agent

Change directory to the workspace directory:

cd map_rl

To train a LogDQN agent, similar to that introduced by van Seijen, Fatemi & Tavakoli (2019), run the following command:

python -um map_rl.train \
  --base_dir=/tmp/log_dqn \
  --gin_files='configs/map_dqn.gin' \
  --gin_bindings='MapDQNAgent.map_func_id="[log,log]"' \
  --gin_bindings='MapDQNAgent.rew_decomp_id="polar"' &

Here, polar refers to the reward decomposition scheme described in Equation 13 of Fatemi & Tavakoli (2022) (which has two reward channels) and [log,log] results in a logarithmic mapping for each of the two reward channels.

Train a LogLinDQN agent, similar to that described by Fatemi & Tavakoli (2022), using:

python -um map_rl.train \
  --base_dir=/tmp/loglin_dqn \
  --gin_files='configs/map_dqn.gin' \
  --gin_bindings='MapDQNAgent.map_func_id="[loglin,loglin]"' \
  --gin_bindings='MapDQNAgent.rew_decomp_id="polar"' &

Creating custom agents

To instantiate a custom agent, simply set the mapping functions for each channel and a reward decomposition scheme. For instance, the following setting

MapDQNAgent.map_func_id="[log,identity]"
MapDQNAgent.rew_decomp_id="polar"

results in a logarithmic mapping for the positive-reward channel and the identity mapping (same as in DQN) for the negative-reward channel.

To use more complex reward decomposition schemes, such as Configurations 1 and 2 from Fatemi & Tavakoli (2022), you can do as follows:

MapDQNAgent.map_func_id="[identity,identity,log,log,loglin,loglin]"
MapDQNAgent.rew_decomp_id="config_1"

To instantiate an ensemble of two learners, each using a polar reward decomposition, use the following syntax:

MapDQNAgent.map_func_id="[loglin,loglin,log,log]"
MapDQNAgent.rew_decomp_id="two_ensemble_polar"

Custom mappings and reward decomposition schemes

To implement custom mapping functions and reward decomposition schemes, we suggest that you draw on insights from Fatemi & Tavakoli (2022) and follow the format of such methods in map_dqn_agent.py to design yours.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
map_rl		map_rl
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE.md		NOTICE.md
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Orchestrated Value Mapping

LICENSE

Microsoft Open Source Code of Conduct

Citing

Getting started

Virtual environment

Prerequisites

Installing from source

Training an agent

Creating custom agents

Custom mappings and reward decomposition schemes

About

Releases

Packages

Languages

License

atavakol/orchestrated-value-mapping

Folders and files

Latest commit

History

Repository files navigation

Orchestrated Value Mapping

LICENSE

Microsoft Open Source Code of Conduct

Citing

Getting started

Virtual environment

Prerequisites

Installing from source

Training an agent

Creating custom agents

Custom mappings and reward decomposition schemes

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages