This repository contains the implementation of a hybrid Soft Actor-Critic (SAC) and Conservative Q-Learning (CQL) approach to solve the Lunar Lander problem in OpenAI's Gym environment. The project aims to develop an agent capable of safely landing on the moon while optimizing fuel usage and minimizing risks.
Here is an overview of the main files and directories in this repository:
D:.
│ buffer_ablation.py # Script for ablation studies on the buffer
│ main.py # Main script to run experiments
│ paper.pdf # Project report
│ README.md # This file
│ requirements.txt # Python dependencies for the project
│ tree.txt # File tree structure
│
├───CQL-SAC-Combine
│ agent.py # Agent implementation for the hybrid model
│ buffer.py # Replay buffer for the CQL-SAC model
│ eval.py # Evaluation script for the CQL-SAC model
│ networks.py # Neural network architectures
│ train.py # Training loop for the CQL-SAC model
│ utils.py # Utility functions
│
└───SAC-Online
agent.py # Agent implementation for SAC
buffer.py # Replay buffer for SAC
eval.py # Evaluation script for SAC
generate_dataset.py # Script to generate datasets from SAC
networks.py # Neural network architectures
train.py # Training loop for SAC
utils.py # Utility functions
To run the code, follow these setup instructions:
-
Clone the repository:
git clone https://github.com/TobyLeelsz/offline-online-combine-training.git cd offline-online-combine-training
-
Install dependencies:
pip install -r requirements.txt
-
Register for Weights & Biases (wandb):
You need to set up an account on Weights & Biases to log and visualize the training process. Once registered, configure your environment:
wandb login
Follow the prompts to enter your API key.
To start an experiment with the default settings:
python main.py
This script will train a SAC agent, generate a dataset, and then train a CQL-SAC hybrid agent using the combined online and offline data. Note: Administrator privileges are required to run the script.
Modify main.py
to tweak hyperparameters or change the training configuration. For computation convinience, it is recommended to set the parameters episodes
and n_episode
to smaller values.
You may also run buffer_ablation.py
to perform ablation studies on the buffer size and its effects.
Contributions to this project are welcome! Please fork the repository and submit a pull request with your proposed changes. For major changes, please open an issue first to discuss what you would like to change.
This project is open-source and available under the MIT License.