Enhancing Lunar Lander Performance with Offline-online combined CQL-SAC Approaches

This repository contains the implementation of a hybrid Soft Actor-Critic (SAC) and Conservative Q-Learning (CQL) approach to solve the Lunar Lander problem in OpenAI's Gym environment. The project aims to develop an agent capable of safely landing on the moon while optimizing fuel usage and minimizing risks.

Project Structure

Here is an overview of the main files and directories in this repository:

D:.
│   buffer_ablation.py     # Script for ablation studies on the buffer
│   main.py                # Main script to run experiments
│   paper.pdf              # Project report
│   README.md              # This file
│   requirements.txt       # Python dependencies for the project
│   tree.txt               # File tree structure
│
├───CQL-SAC-Combine
│       agent.py           # Agent implementation for the hybrid model
│       buffer.py          # Replay buffer for the CQL-SAC model
│       eval.py            # Evaluation script for the CQL-SAC model
│       networks.py        # Neural network architectures
│       train.py           # Training loop for the CQL-SAC model
│       utils.py           # Utility functions
│
└───SAC-Online
        agent.py           # Agent implementation for SAC
        buffer.py          # Replay buffer for SAC
        eval.py            # Evaluation script for SAC
        generate_dataset.py   # Script to generate datasets from SAC
        networks.py        # Neural network architectures
        train.py           # Training loop for SAC
        utils.py           # Utility functions

Installation

To run the code, follow these setup instructions:

Clone the repository:

git clone https://github.com/TobyLeelsz/offline-online-combine-training.git
cd offline-online-combine-training

Install dependencies:
```
pip install -r requirements.txt
```
Register for Weights & Biases (wandb):

You need to set up an account on Weights & Biases to log and visualize the training process. Once registered, configure your environment:
```
wandb login
```
Follow the prompts to enter your API key.

Quick Start

To start an experiment with the default settings:

python main.py

This script will train a SAC agent, generate a dataset, and then train a CQL-SAC hybrid agent using the combined online and offline data. Note: Administrator privileges are required to run the script.

Customizing Experiments

Modify main.py to tweak hyperparameters or change the training configuration. For computation convinience, it is recommended to set the parameters episodes and n_episode to smaller values.

You may also run buffer_ablation.py to perform ablation studies on the buffer size and its effects.

Contributing

Contributions to this project are welcome! Please fork the repository and submit a pull request with your proposed changes. For major changes, please open an issue first to discuss what you would like to change.

License

This project is open-source and available under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Lunar Lander Performance with Offline-online combined CQL-SAC Approaches

Project Structure

Installation

Quick Start

Customizing Experiments

Contributing

License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CQL-SAC-Combine		CQL-SAC-Combine
SAC-Online		SAC-Online
README.md		README.md
buffer_ablation.py		buffer_ablation.py
main.py		main.py
paper.pdf		paper.pdf
requirements.txt		requirements.txt
tree.txt		tree.txt

TobyLeelsz/offline-online-combine-training

Folders and files

Latest commit

History

Repository files navigation

Enhancing Lunar Lander Performance with Offline-online combined CQL-SAC Approaches

Project Structure

Installation

Quick Start

Customizing Experiments

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages