🔗 Documentation | 🔗 PyPI Package
Rapidae is a Python library specialized in simplifying the creation and experimentation of autoencoder models. With a focus on ease of use, this library allows users to explore and develop autoencoder models in an efficient and straightforward manner.
I decided to develop this library to optimize my research workflow and provide a comprehensive resource for educators and learners exploring autoencoders.
As a researcher, I often found myself spending time on repetitive tasks, such as creating project structures or replicating baseline models. (I've lost count of how many times I've gone through the Keras VAE tutorial just to copy the model as a baseline for other experiments.)
As an educator, despite recognizing numerous fantastic online resources, I felt the need for a place where the features I consider important for teaching these models are consolidated: explanation, implementation, and versatility across different backends. The latter is particularly crucial, considering that PyTorch practitioners may find tedious to switch to TensorFlow, and vice versa. With the recently released Keras 3, Rapidae ensures that the user is met with a seamless and engaging experience, enabling to focus on model creation rather than backend specifics.
In summary, this library is designed to be simple enough for educational purposes, yet robust for researchers to concentrate on developing their models and conducting benchmark experiments in a unified environment.
Note
Shout out to Pythae, which provides an excellent library for experimenting with VAEs . If you're looking for a quick way to implement autoencoders for image applications, Pythae is probably your best option. Rapidae differs from Pythae in the following ways:
- It is built on Keras 3, allowing you to experiment with and provide your implementations in either PyTorch, TensorFlow, or JAX.
- The image models implemented in Rapidae are primarily designed for educational purposes.
- Rapidae is intended to serve as a benchmarking library for models implemented in the sequential/time-series domain, as these are widely dispersed across various fields.
🚨Call for contributions🚨
If you want to add your model to the package or collaborate in the package development feel free to shoot me a message at [email protected] or just open an issue or a pull request. I´ll be happy to collaborate with you.
- Main Features
- Overview
- Installation
- Available models
- Usage
- Custom models and architectures
- Switching backends
- Experiment tracking with wandb
- Documentation
- Citing this repository
-
Ease of Use: Rapidae has been designed to make the process of creating and experimenting with autoencoders as simple as possible, users can create and train autoencoder models with just a few lines of code.
-
Backend versatility: Rapidae relies on Keras 3.0, which is backend agnostic, allowing switching indistinctly between Tensorflow, Pytorch or Jax.
-
Customization: Easily customize model architecture, loss functions, and training parameters to suit your specific use case.
-
Experimentation: Conduct experiments with different hyperparameters and configurations to optimize the performance of your models.
Rapidae is structured as follows:
-
data: This module contains everything related to the acquisition and preprocessing of datasets.
-
models: This is the core module of the library. It includes the base architectures on which new ones can be created, several predefined architectures and a list of predefined default encoders and decoders.
-
pipelines: Pipelines are designed to perform a specific task or set of tasks such as data preprocessing or model training.
-
evaluate: Its main functionality is the evaluation of model performance. It also includes a tool utils for various tasks: latent space visualization, reconstructions, evaluation, etc.
The library has been tested with Python versions >=3.10, <3.12, therefore we recommend first creating a virtual environment with a suitable python version. Here´s an example with conda:
conda create -n rapidae python=3.10
Then, just activate the environment with conda activate rapidae
and install the library.
Note
If you are using Google Colab, you are good to go (i.e. you do not need to create an environment). The library is fully compatible with Colab´s default environment.
To install the latest stable release of this library run the following:
pip install rapidae
Note that you will also need to install a backend framework. Here are the official installation guidelines:
Important
If you install TensorFlow, you should reinstall Keras 3 afterwards via pip install --upgrade keras
. This is a temporary step while TensorFlow is pinned to Keras 2, and will no longer be necessary after TensorFlow 2.16. The cause is that tensorflow==2.15 will overwrite your Keras installation with keras==2.15.
You can also clone the repo to have fully access to all the code. Some features may not yet be available in the published stable version so this is the best way to stay up-to-date with the latest updates.
git clone https://github.com/NahuelCostaCortez/rapidae
cd rapidae
Then you only have to install the requirements:
pip install -r requirements.txt
Below is the list of the models currently implemented in the library.
Models | Training example | Paper | Official Implementation |
---|---|---|---|
Autoencoder (AE) | link | ||
Beta Variational Autoencoder (BetaVAE) | link | ||
Contractive Autoencoder | link | ||
Denoising Autoencoder | link | link | |
Hierarchical Variational Autoencoder (HVAE) | SOON | link | link |
ICFormer | SOON | link | link |
interval-valued Variational Autoencoder (iVAE) | IN PROGRESS | ||
Recurrent Variational AutoEncoder (RVAE) | link | link | |
Recurrent Variational Encoder (RVE) | link | link | |
Sparse Autoencoder | link | ||
Time VAE | link | ||
Variational Autoencoder (VAE) | link | link | |
Vector Quantised-Variational AutoEncoder (VQ-VAE) | link | link |
|
Here you have a simple tutorial with the most relevant aspects of the library. In addition, in the examples folder, you will find a series of notebooks for each model and with particular use cases.
You can also use a web interface made with Streamlit where you can load datasets, configure models and hypeparameters, train, and evaluate the results. Check the web interface notebook.
You can provide your own autoencoder architecture. Here´s an example for defining a custom encoder and a custom decoder:
from rapidae.models.base import BaseEncoder, BaseDecoder
from keras.layers import Dense
class Custom_Encoder(BaseEncoder):
def __init__(self, input_dim, latent_dim, **kwargs): # you can add more arguments, but al least these are required
BaseEncoder.__init__(self, input_dim=input_dim, latent_dim=latent_dim)
self.layer_1 = Dense(300)
self.layer_2 = Dense(150)
self.layer_3 = Dense(self.latent_dim)
def call(self, x):
x = self.layer_1(x)
x = self.layer_2(x)
x = self.layer_3(x)
return x
class Custom_Decoder(BaseDecoder):
def __init__(self, input_dim, latent_dim, **kwargs): # you can add more arguments, but al least these are required
BaseDecoder.__init__(self, input_dim=input_dim, latent_dim=latent_dim)
self.layer_1 = Dense(self.latent_dim)
self.layer_2 = Dense(self.input_dim)
def call(self, x):
x = self.layer_1(x)
x = self.layer_2(x)
return x
You can also provide a custom model. This is specially useful if you want to implement your own loss function.
from rapidae.models.base import BaseAE
from keras.ops import mean
from keras.losses import mean_squared_error
class CustomModel(BaseAE):
def __init__(self, input_dim, latent_dim, encoder, decoder):
# If you are adding your model to the source code there is no need to specify the encoder and decoder, just place them in the same directory as the model and the BaseAE constructor will initialize them
BaseAE.__init__(
self,
input_dim=input_dim,
latent_dim=latent_dim,
encoder=encoder,
decoder=decoder
)
def call(self, x):
# IMPLEMENT FORWARD PASS
x = self.encoder(x)
x = self.decoder(x)
return x
def compute_loss(self, x=None, y=None, y_pred=None, sample_weight=None):
'''
Computes the loss of the model.
x: input data
y: target data
y_pred: predicted data (output of call)
sample_weight: Optional array of the same length as x, containing weights to apply to the model's loss for each sample
'''
# IMPLEMENT LOSS FUNCTION
loss = mean(mean_squared_error(x, y_pred))
return loss
Since Rapidae uses Keras 3, you can easily switch among Tensorflow, Pytorch and Jax (Tensorflow is the selected option by default).
You can export the environment variable KERAS_BACKEND or you can edit your local config file at ~/.keras/keras.json to configure your backend. Available backend options are: "jax", "tensorflow", "torch". Example:
export KERAS_BACKEND="torch"
In a notebook, you can do:
import os
os.environ["KERAS_BACKEND"] = "torch"
import keras
If you want to add experiment tracking to rapidae models you can just create a Wandb callback and pass it to the TrainingPipeline as follows (this also applies to other experiment tracking frameworks):
wandb_cb = WandbCallback()
wandb_cb.setup(
training_config=your_training_config,
model_config=your_model_config,
project_name="your_wandb_project",
entity_name="your_wandb_entity",
)
pipeline = TrainingPipeline(name="you_pipeline_name",
model=model,
callbacks=[wandb_cb])
Check out the full documentation for detailed information on installation, usage, examples and recipes: 🔗 Documentation Link
All documentation source and configuration files are located inside the docs directory.
If you are experiencing any issues while running the code or request new features/models to be implemented please open an issue on github.
If you find this work useful or incorporate it into your research, please consider citing it 🙏🏻.
@software{Costa_Rapidae,
author = {Costa, Nahuel},
license = {Apache-2.0},
title = {{Rapidae}},
url = {https://github.com/NahuelCostaCortez/rapidae}
}