Skip to content

Latest commit

 

History

History
264 lines (195 loc) · 14.4 KB

README.md

File metadata and controls

264 lines (195 loc) · 14.4 KB

AdLeap-MAS: An Open-source Multi-Agent Simulator for Ad-hoc Learning and Planning

Introduction

AdLeap-MAS represents a novel framework focused on the implementation and simulation of Ad-hoc reasoning domains, which considers the approach of collaborative and adversarial contexts focused on ad-hoc environment learning and planning. The framework aims to facilitate the running of experiments in the domain and also re-use existing codes across different environments. In other words, this proposal aims to minimise the implementation cost related to the process that precedes the domain evaluation, which could include the environment design, components settings and, benchmark set definition, while simultaneously improving the robustness of the environment and minimising the errors carried out due to mistakes made in the code adaptation or implementation. Through the definition of a component-based architecture, AdLeap-MAS implements Open-AI Gym package for Python 3 as the primary tool to define its base components. Designed to be an open-source framework and a specialised version of the Open-AI Gym simulator, we offer the base classes for implementing new contexts and scenarios of the community's interest.

drawing

Summary

In this README you can find:

GET STARTED

1. Dependencies 📝

You must install Python 3 and the OpenAI Gym package to run the framework. You can use Python 3 website and the OpenAI Gym GitHub for information about installation OR, if you are programming at Linux, run the following command lines:

For Python 3:

sudo apt-get install software-properties-common & sudo add-apt-repository ppa:deadsnakes/ppa & sudo apt-get update & sudo apt-get install python3.8

For OpenAI Gym (Minimal Install):

pip install gym

For OpenAI Gym (Full Install):

apt-get install -y libglu1-mesa-dev libgl1-mesa-dev libosmesa6-dev xvfb ffmpeg curl patchelf libglfw3 libglfw3-dev cmake zlib1g zlib1g-dev swig

NOTE: make sure that the NumPy package is also installed for Python 3 before running the framework.

pip install numpy OR pip3 install numpy

Windows

To execute this framework on Windows OS, you will need to work within the Windows Subsystem for Linux. We recommend installing Ubuntu 16.04 and Ubuntu 18.04 as your Linux distribution into the Windows Subsystem. Moreover, you must install the VcXserver Windows X Server to compile the framework correctly. The VcXserver will enable the simulated environment visualisation, creating the correct display to run your tests.

Once you started your VcXserver (before running the framework), select the following options on the start screen:

drawing drawing
drawing

NOTE: If your program still cannot access the virtual screen or the error NoSuchDisplay arises, the following line may fix the problem:

export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0


2. Usage 💪

With all dependencies installed, you have to download this GitHub project and set it on your local workspace. To start the framework, you only need to choose an experiment configuration and run the file experiment_[configuration].py. For example, via the command line you can use (within the main project directory):

python3 experiment_respawn.py

That's all folks. At this point, you will have the display popping up and the simulation starting with the default components.

NOTE: If you want to run/implement different environments, you can create new main files using the same routine presented for the Level-Foraging or Truco Environments execution, which can be easily specified by the following routine:

    """Generic AdLeap-MAS execution routine"""

    env = AdhocReasoningEnv(args)
    state = env.reset()
    
    while not done and env.episode < max_episode:
        env.render()

        next_action, _ = type_planning(state,agent)

        state, reward, done, info = env.step(next_action)

        if done:
            break

    env.close()

Understanding (High level view)

The AdLeap-MAS framework’s architecture is based on unilateral and cyclical module communication, where the information within the framework must be delivered or received directly and exclusively by one module from another in the architecture. Such design enables the problem simulation as a step-by-step process, processing each fragment of the simulation (i.e., functionalities) independently. As in a cascade workflow definition, this specific approach guarantees the correct information analysis and transformation in each step. Furthermore, it is important to note that each module acts independently from the other components. As such, learning and reasoning are based solely on the delivered information. The following figure presents this idea at a high level.

drawing

From this perspective, we designed each component to achieve the final purpose. Consequently, we describe the desired final products for each component.

Perhaps, the arising question now is: how did we separate the environment from its components and the components from their learning and reasoning modules? The answer is direct: we do not. However, considering that each module works strictly over the current information, it is reasonable to assume that this data can provide sufficient knowledge to simulate the environment without building a bilateral communication channel. This feature can facilitate the modification in the environment without affecting other functionalities already implemented and tested.

As an user, you must answer the following question to get started:

drawing

For further explanation, we suggest the reading of our paper available at: to appear


3. How to change the components within the framework? 😨

Changing components of the environment is REALLY not troublesome. The idea is simple: you must have the code that implements your desired element (which can refer to the agents, tasks or even the reasoning module) and add it to the environment's components dictionary. Presenting it clearer, the following code shows the base structure to plug-in components to your experiment:

    """Generic AdLeap-MAS environment's components definition"""
    from your_agent_implementation_module import Agent
    from your_task_implementation_module import Task
    from your_environment_implementation_module import Environment

    components = {
    'agents':[
        Agent(index='A',atype='reasoning_1'),
        Agent(index='B',atype='reasoning_2'),
        Agent(index='C',atype='reasoning_3'),
        Agent(index='D',atype='reasoning_4')
    ],
    'tasks':[Task('1',(2,2),1.0),
            Task('2',(4,4),1.0),
            Task('3',(5,5),1.0),
            Task('4',(8,8),1.0)]}

    env = Environment(components)

That is it! At this point, your environment already implements the desired components within the case of study.

Regarding the reasoning modules, they do not need a proper importation because our framework implements a generic method to call the reasoning. In this way, your reasoning module just needs to have the following structure to run within the architecture:

    """Generic AdLeap-MAS reasoning modules implementation"""
    """- Example file name: mymethod.py"""

    def mymethod_planning(environment, adhoc_agent, ...):

        """ code here """

        return action, _

Again: that is all folks! At this point, your reasoning method already can be used within our framework for every case of study.

For further explanation, we suggest the reading of our paper available at: to appear


EXAMPLES

Initially introduced to evaluate ad hoc teamwork, the Level-based Foraging domain [1, 2] represents a problem in which a team of agents must collaborate to accomplish a certain number of tasks in an environment, optimising the time spent in the activity via active collaboration-coordination. The agents have a certain level (strength) that defines if it is able to collect an item (e.g., a box) of a specific weight. The boxes are distributed in the environment, and the agents cannot communicate with their teammates. The following figure illustrates the idea of the problem.

drawing

As presented, the AdLeap-MAS can implement this problem, while enabling the simulation of (i) a real-time decision (instead of a turn-based approach) and (ii) an online learning and planning of the problem. The environment implementation delivers only the visible information to the agents, deferring to them the responsibility to reason about the missing data and build the corresponding belief state. Additionally, in this domain, the agents have four parameters: level, vision radius, vision angle and type; and the tasks have only one parameter: weight. The initial position and these parameters are all concealed from the agents.

A popular card game in Brazil, Truco is played by pairs of people, compounding two teams. The game starts with dealing three cards for each player and turning up one card on the table. Each card has a strength associated with its rank and suit, which will compare the cards, one against the other. The team's goal is to score 12 points over a maximum of 23 rounds, playing over a best-of-three game system. The team scores 1 point if they win the best-of-three round. The following figure illustrates Truco's table for four players playing the game.

drawing

Categorising a completely distinct environment from the Level-based Foraging domain, the implementation of this card game has a principal objective to show the versatility offered by the AdLeap-MAS. Furthermore, the implementation enables the simulation of (i) a turn-based approach for the decision-making process and (ii) an online learning of the problem by receiving partial information mainly via the observation of the adversaries' and teammate's play. Additionally, the environment delivers to the agents only the visible information, allowing them to reason about the missing data and build their belief state. Note that even though all the hands are visible in the interface, it is not related to the actual information.


DEVELOPMENT INFORMATION

Status: In development. 💻

More information will be documented and presented soon.


REFERENCES

[1] Stefano V Albrecht and Subramanian Ramamoorthy. 2015. A game-theoretic model and best-response learning method for ad hoc coordination in multi agent systems. arXiv preprint arXiv:1506.01170 (2015).

[2] Peter Stone, Gal A. Kaminka, Sarit Kraus, and Jeffrey S. Rosenschein. 2010. AdHoc Autonomous Agent Teams: Collaboration without Pre-Coordination. In Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI).