VulkanShaderCUDA is a high-performance tensor computation framework that implements PyTorch-like operations using Vulkan compute shaders. The project aims to provide a vendor-agnostic alternative to CUDA-based deep learning frameworks, enabling GPU acceleration across a wider range of hardware.
-
Core Operations
- Element-wise addition with near-zero overhead
- Matrix multiplication (optimized with shared memory tiling)
- ReLU activation function
- Sigmoid activation function (numerically stable implementation)
- All core operations validated against PyTorch with matching precision
-
Memory Management
- Zero-copy buffer pooling system
- Efficient resource reuse
- Automated cleanup
-
Advanced Operations
- Softmax (numerical stability improvements in progress)
- MaxPool2D (implementation refinements ongoing)
- Conv2D (tensor reshape handling in progress)
-
Gradient Computations
- Element-wise operation gradients complete
- Matrix multiplication gradients working
- Advanced operation gradients in development
- Memory-first architecture with buffer pooling
- Vulkan compute shader-based operations
- PyBind11 integration for seamless NumPy interop
- SPIR-V shader compilation pipeline
- Shared memory utilization in compute shaders
- Workgroup size optimization
- Asynchronous command buffer execution
- Minimal host-device transfers
-
Vulkan SDK:
# Download and install from: https://vulkan.lunarg.com/sdk/home # Minimum version: 1.3.296.0
-
Python Environment:
pip install numpy pybind11 torch torchvision torchaudio
setup_vulkan_project.bat
The script handles:
- Vulkan SDK environment configuration
- Python virtual environment setup
- Dependency installation
- SPIR-V shader compilation
- Backend module building
import numpy as np
from vulkan_backend import init_vulkan, vulkan_add, vulkan_matmul
# Initialize Vulkan
init_vulkan()
# Element-wise Addition
a = np.random.rand(1024).astype(np.float32)
b = np.random.rand(1024).astype(np.float32)
c = np.zeros_like(a)
vulkan_add(a, b, c)
# Matrix Multiplication
M, K, N = 128, 256, 128
a = np.random.rand(M, K).astype(np.float32)
b = np.random.rand(K, N).astype(np.float32)
c = np.zeros((M, N), dtype=np.float32)
vulkan_matmul(a.flatten(), b.flatten(), c.flatten(), M, K, N)
# Activation Functions
vulkan_relu(input_data.flatten(), output.flatten())
vulkan_sigmoid(input_data.flatten(), output.flatten())
- Stabilize Softmax implementation
- Complete Conv2D tensor handling
- Optimize MaxPool2D implementation
- Add BatchNorm support
- Implement automatic differentiation
- Add layer abstractions
- Support model import/export
- Optimize memory patterns for training
- Full PyTorch model compatibility
- Custom model deployment pipeline
- Mobile GPU optimization
- Distributed computing support
- Smart buffer pooling system
- Automatic resource cleanup
- Zero-copy operations where possible
- Shared memory optimization
- SPIR-V based compute shaders
- Workgroup optimization
- Local memory utilization
- Batched operation support
Contributions are welcome! We're particularly interested in:
- Numerical stability improvements
- Memory optimization techniques
- New operation implementations
- Testing and validation
For technical support:
- Discord: Contact waefrebeorn
- Submit issues through GitHub
MIT License - See LICENSE file for details
- Vulkan SDK team for comprehensive documentation
- PyBind11 team for Python binding capabilities
- PyTorch team for architectural inspiration
- Open-source ML community for testing and feedback