This update reduces memory consumption by 3x, enabling the processing of UHD/4K resolutions on GPUs with 8GB of memory.
Most optimizations were made in the Attention layer, where we consolidated inplace operations and broke down tensors into smaller batches to reduce peak memory usage.
Full Changelog: 1.1.0...1.2.0