Skip to content

ligerfotis/maze_RL_v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Maze 3D Collaborative Learning on shared task

Description

A human-agent collaborative game in a virtual environment based on the work of Shafti et al. (2020) [1]. Collaborative Learning is achieved through Deep Reinforcement Learning (DRL). The Soft-Actor Critic (SAC) algorithm is used [2] with modifications for discrete action space [3].

Installation

  • Run source install_dependencies/install.sh. A python virtual environment will be created and the necessary libraries will be installed. Furthermore, the directory of the repo will be added to the PYTHONPATH environmental variable.

Run

  • Run python game/maze3d_human_only_test.py game/config/onfig_human_test.yaml <participant_name> for human-only game.
  • Run python game/sac_maze3d_train.py game/config/<config_sac> <participant_name> for human-agent game.
    • Notes before training:
      • Set the <participant_name> to the name of the participant.
      • The program will create a /tmp and a /plot folder (if they do not exist) in the results/ folder. The /tmp folder contains CSV files with information of the game. The /plot folder contains figures for tha game. See here for more details.
      • The program will automatically create an identification number after your name on each folder name created

Configuration

  • In the game/config folder several YAML files exist for the configuration of the experiment. The main parameters are listed below.
    • game/discrete: True if the keyboard input is discrete (False for continuous). Details regarding the discrete and continuous human input mode can be found here
    • SAC/reward_function: Type of reward function. Details about the predefined reward functions and how to define a new one can be found here.
    • Experiment/mode: Choose how the game will be terminated; either when a number of games, or a number of interactions is completed.
    • SAC/discrete: Discrete or normal SAC (Currently only the discrete SAC is compatible with the game)

Play

Game

  • Human only Use Left and Right arrows to control the tilt of the tray around its y-axis and use Up and Down arrows to control the tile of the tray around its x-axis as shown in the previous picture
  • Human-Agent Use Left and Right arrows to control the tilt of the tray around its y-axis
  • Press once the space key to pause, and a second time to resume
  • Press q to exit the experiment.

Citation

If you use this repository in your publication please cite below:

Fotios Lygerakis, Maria Dagioglou, and Vangelis Karkaletsis. 2021. Accelerating Human-Agent Collaborative Reinforcement Learning. InThe 14th PErvasive Technologies Related to Assistive Environments Conference (PETRA2021), June 29-July 2, 2021, Corfu, Greece.ACM, New York, NY, USA, 3 pages.https://doi.org/10.1145/3453892.3454004

Experiment Result Output Files

Contents of a/tmp folder. The terms "training/testing trial", "game step" and "experiment" are explained in [4] in detail:

  • actions.csv: All the actions performed during the experiment in the format ( aagent [4], ahuman).
  • avg_length_list.csv: The length of each training trial in terms of game step.
  • test_length_list.csv: The length of each test trial in terms of game step.
  • config_sac.yaml: The configuration file used for this experiment. It's purpose it to be able to replicate this experiment.
  • episode_durations.csv: The total duration of each training trial.
  • test_episode_duration_list.csv: The total duration of each testing trial.
  • grad_updates_durations.csv: The total duration of an offline gradient update session for each trial. In combination with the episode_durations.csv are used to calculate the cumulative total time elapsed as shown on Figure 4 of [4].
  • scores.csv: The total score for each training trial.
  • test_score_history.csv: The total score for each testing trial. The mean and standard error of the mean over each session is used in [4] for figures 2 and 3
  • rest_info.csv: goal position, total experiment duration, best score achieved, the trial that achieved the best score, the best reward achieved, the length of the game trial with the best score, the total amount of time steps for the whole experiment, the total number of games played, the fps the game run on and the average offline gradient update duration over all sessions.

Contents of a/plot folder:

  • episode_durations.png
  • grad_updates_durations.png
  • length.png
  • scores.png
  • test_episode_duration.png
  • test_length.png
  • test_scores.png
  • test_scores_mean_std.png
  • training_logs.pkl: a pandas framework saves in pickle format that contains the action and state for each training game step.

References

[1] Shafti, Ali, et al. "Real-world human-robot collaborative reinforcement learning." arXiv preprint arXiv:2003.01156 (2020).

[2] https://github.com/kengz/SLM-Lab

[3] Christodoulou, Petros. "Soft actor-critic for discrete action settings." arXiv preprint arXiv:1910.07207 (2019).

[4] Fotios Lygerakis, Maria Dagioglou, and Vangelis Karkaletsis. 2021. Accelerating Human-Agent Collaborative Reinforcement Learning. InThe 14th PErvasive Technologies Related to Assistive Environments Conference (PETRA2021), June 29-July 2, 2021, Corfu, Greece.ACM, New York, NY, USA, 3 pages.https://doi.org/10.1145/3453892.3454004

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published