Skip to content

BhanuPrakashPebbeti/Image-Generation-Using-VQVAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Generation Using VQVAE

🚀 Introduction to Generative AI with VQVAE

Vector Quantized Variational Autoencoders (VQ-VAE) represent a powerful approach in generative AI for creating high-quality images. This project implements a VQ-VAE architecture combined with an autoregressive prior (GPT) to generate novel images with impressive fidelity and diversity.

Unlike traditional GANs or vanilla VAEs, the VQ-VAE framework offers several advantages in the generative AI space:

  • Discrete latent representations that capture meaningful semantic features
  • High-quality image generation without mode collapse issues
  • Controllable generation through manipulations in latent space
  • Efficient sampling compared to diffusion models

🔍 Technical Overview of VQVAE Architecture

VQ-VAE differs from standard VAEs in two fundamental ways:

  1. The encoder network outputs discrete codes rather than continuous vectors
  2. A learnable prior replaces the static prior distribution

The vector quantization (VQ) mechanism enables the model to avoid posterior collapse, a common issue in VAE frameworks where latents are ignored when paired with powerful autoregressive decoders. By using discrete latent representations and training an autoregressive prior, this model can generate high-quality images while maintaining diversity.

VQVAE Model Architecture

Quantization Module

🛠️ Two-Stage Training Process

This generative AI system is trained in two distinct stages:

Stage 1: VQVAE Training

  • The VQVAE is trained on an image reconstruction task to learn discrete features from the input data
  • The encoder compresses images into a discrete latent space
  • The decoder learns to reconstruct the original images from these discrete codes
  • The vector quantization layer maps continuous representations to the nearest vectors in a learned codebook

Stage 2: Autoregressive Prior Training

  • After VQVAE training, we collect all discrete latent codes from our training images
  • A GPT model serves as the autoregressive prior, learning to predict the next latent codes based on previous ones
  • This prior model captures the statistical dependencies between latent codes, enabling coherent image generation

Discrete Latent Codes from Trained VQVAE:

Training GPT Prior with Future Token Prediction:

📊 Results and Evaluation

VQVAE Reconstructions

The VQVAE model demonstrates strong reconstruction capabilities, preserving key visual elements while compressing the image to discrete latent codes:

Generated Images

Novel images generated by sampling from the GPT prior and decoding with the VQVAE decoder:

💡 Key Advantages of VQVAE in Generative AI

  • Discrete Latent Space: Unlike continuous latent models, VQVAE creates a more structured and interpretable representation
  • High Fidelity: Generates sharp, detailed images without the blurriness common in vanilla VAEs
  • Efficient Sampling: Once trained, generation is faster than many iterative approaches like diffusion models
  • Scalability: The architecture can be adapted to various domains beyond images (audio, video, etc.)
  • Controllable Generation: The discrete nature of the latent space facilitates manipulation and controlled generation

🔄 Comparison with Other Generative AI Approaches

Model Type Latent Space Training Stability Sample Quality Sampling Speed
VQ-VAE + GPT Discrete High High Fast
GAN Continuous Low (mode collapse) High Fast
Vanilla VAE Continuous High Medium Fast
Diffusion Models N/A High Very High Slow

🚶‍♀️ Next Steps

Potential improvements and extensions to this generative AI system:

  • Implement conditional generation capabilities
  • Explore hierarchical VQ-VAE architectures for higher resolution images
  • Incorporate attention mechanisms in the prior model
  • Experiment with different codebook sizes and dimensions
  • Apply the model to specialized domains like medical imaging or satellite imagery

About

Image Generation using VQVAE and GPT Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published