This project introduces the game "Among Us" as a model organism for lying and deception and studies how AI agents learn to express lying and deception, while evaluating the effectiveness of AI safety techniques to detect and control out-of-distribution deception.
The aim is to simulate the popular multiplayer game "Among Us" using AI agents and analyze their behavior, particularly their ability to deceive and lie, which is central to the game's mechanics.
-
Clone the repository:
git clone https://github.com/7vik/AmongUs.git cd AmongUs
-
Set up the environment:
conda create -n amongus python=3.10 conda activate amongus
-
Install dependencies:
pip install -r requirements.txt
To run the sandbox and log games of various LLMs playing against each other, run:
main.py
You will need to add a .env
file with an OpenRouter API key.
Alternatively, you can download 400 full-game logs (for Phi-4-15b
and Llama-3.3-70b-instruct
) and 810 game summaries from the HuggingFace dataset to reproduce the results in the paper (and evaluate your own techniques!).
To reproduce our Deception ELO and Win Rate results, run:
python elo/deception_elo.py
Once the (full) game logs are in place, use the following command to cache the activations of the LLMs:
python linear-probes/cache_activations.py --dataset <dataset_name>
This loads up the HuggingFace models and caches the activations of the specified layers for each game action step. This step is computationally expensive, so it is recommended to run this using GPUs.
Use configs.py
to specify the model and layer to cache, and other configuration options.
To evaluate the game actions by passing agent outputs to an LLM, run:
bash evaluations/run_evals.sh
You will need to add a .env
file with an OpenAI API key.
Alternatively, you can download the ground truth labels from the HuggingFace.
(TODO)
Once the activations are cached, training linear probes is easy. Just run:
python linear-probes/train_all_probes.py
You can choose which datasets to train probes on - by default, it will train on all datasets.
To evaluate the linear probes, run:
python linear-probes/eval_all_probes.py
You can choose which datasets to evaluate probes on - by default, it will evaluate on all datasets.
It will store the results in linear-probes/results/
, which are used to generate the plots in the paper.
We use the Goodfire API to evaluate SAE features on the game logs. To do this, run the notebook:
reports/2025_02_27_sparse_autoencoders.ipynb
You will need to add a .env
file with a Goodfire API key.
.
├── CONTRIBUTING.md # Contribution guidelines
├── Dockerfile # Docker setup for project environment
├── LICENSE # License information
├── README.md # Project documentation (this file)
├── among-agents # Main code for the Among Us agents
│ ├── README.md # Documentation for agent implementation
│ ├── amongagents # Core agent and environment modules
│ ├── envs # Game environment and configurations
│ ├── evaluation # Evaluation scripts for agent performance
│ ├── notebooks # Jupyter notebooks for running experiments
│ ├── requirements.txt # Python dependencies for agents
│ └── setup.py # Setup script for agent package
├── expt-logs # Experiment logs
├── k8s # Kubernetes configurations for deployment
├── main.py # Main entry point for running the game
├── notebooks # Additional notebooks (not part of the main project)
├── reports # Experiment reports
├── requirements.txt # Python dependencies for main project
├── tests # Unit tests for project functionality
└── utils.py # Utility functions
See CONTRIBUTING.md for details on how to contribute to this project.
This project is licensed under CC0 1.0 Universal - see LICENSE.
- Our game logic uses a bunch of code from AmongAgents.
If you face any bugs or issues with this codebase, please contact Satvik Golechha (7vik) at [email protected].