VulkanShaderCUDA

VulkanShaderCUDA is a high-performance tensor computation framework that implements PyTorch-like operations using Vulkan compute shaders. The project aims to provide a vendor-agnostic alternative to CUDA-based deep learning frameworks, enabling GPU acceleration across a wider range of hardware.

Current Status

✅ Working Features

Core Operations
- Element-wise addition with near-zero overhead
- Matrix multiplication (optimized with shared memory tiling)
- ReLU activation function
- Sigmoid activation function (numerically stable implementation)
- All core operations validated against PyTorch with matching precision
Memory Management
- Zero-copy buffer pooling system
- Efficient resource reuse
- Automated cleanup

🚧 Under Development

Advanced Operations
- Softmax (numerical stability improvements in progress)
- MaxPool2D (implementation refinements ongoing)
- Conv2D (tensor reshape handling in progress)
Gradient Computations
- Element-wise operation gradients complete
- Matrix multiplication gradients working
- Advanced operation gradients in development

Architecture

Core Design

Memory-first architecture with buffer pooling
Vulkan compute shader-based operations
PyBind11 integration for seamless NumPy interop
SPIR-V shader compilation pipeline

Performance Features

Shared memory utilization in compute shaders
Workgroup size optimization
Asynchronous command buffer execution
Minimal host-device transfers

Prerequisites

Vulkan SDK:

# Download and install from:
https://vulkan.lunarg.com/sdk/home
# Minimum version: 1.3.296.0

Python Environment:

pip install numpy pybind11 torch torchvision torchaudio

Quick Start

Windows Setup

setup_vulkan_project.bat

The script handles:

Vulkan SDK environment configuration
Python virtual environment setup
Dependency installation
SPIR-V shader compilation
Backend module building

Usage Examples

Basic Operations

import numpy as np
from vulkan_backend import init_vulkan, vulkan_add, vulkan_matmul

# Initialize Vulkan
init_vulkan()

# Element-wise Addition
a = np.random.rand(1024).astype(np.float32)
b = np.random.rand(1024).astype(np.float32)
c = np.zeros_like(a)
vulkan_add(a, b, c)

# Matrix Multiplication
M, K, N = 128, 256, 128
a = np.random.rand(M, K).astype(np.float32)
b = np.random.rand(K, N).astype(np.float32)
c = np.zeros((M, N), dtype=np.float32)
vulkan_matmul(a.flatten(), b.flatten(), c.flatten(), M, K, N)

# Activation Functions
vulkan_relu(input_data.flatten(), output.flatten())
vulkan_sigmoid(input_data.flatten(), output.flatten())

Development Roadmap

Short-term Goals

Stabilize Softmax implementation
Complete Conv2D tensor handling
Optimize MaxPool2D implementation
Add BatchNorm support

Medium-term Goals

Implement automatic differentiation
Add layer abstractions
Support model import/export
Optimize memory patterns for training

Long-term Vision

Full PyTorch model compatibility
Custom model deployment pipeline
Mobile GPU optimization
Distributed computing support

Technical Details

Memory Management

Smart buffer pooling system
Automatic resource cleanup
Zero-copy operations where possible
Shared memory optimization

Shader Implementation

SPIR-V based compute shaders
Workgroup optimization
Local memory utilization
Batched operation support

Contributing

Contributions are welcome! We're particularly interested in:

Numerical stability improvements
Memory optimization techniques
New operation implementations
Testing and validation

Support

For technical support:

Discord: Contact waefrebeorn
Submit issues through GitHub

License

MIT License - See LICENSE file for details

Acknowledgments

Vulkan SDK team for comprehensive documentation
PyBind11 team for Python binding capabilities
PyTorch team for architectural inspiration
Open-source ML community for testing and feedback

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
comp		comp
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
cleanup.bat		cleanup.bat
compile_shaders.bat		compile_shaders.bat
compile_shaders.sh		compile_shaders.sh
dummy.bat		dummy.bat
merge_comps.bat		merge_comps.bat
merge_comps.py		merge_comps.py
merge_srccpp.bat		merge_srccpp.bat
merge_srccpp.py		merge_srccpp.py
merge_srch.bat		merge_srch.bat
merge_srch.py		merge_srch.py
mergedforllmcomps.txt		mergedforllmcomps.txt
mergedforllmcppfiles.txt		mergedforllmcppfiles.txt
mergedforllmhfiles.txt		mergedforllmhfiles.txt
readme.txt		readme.txt
setup.py		setup.py
setup_vulkan_project.bat		setup_vulkan_project.bat
test_add.py		test_add.py
test_vulkan.bat		test_vulkan.bat
test_vulkan.py		test_vulkan.py
testvulkanadd.bat		testvulkanadd.bat
venv.bat		venv.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VulkanShaderCUDA

Current Status

✅ Working Features

🚧 Under Development

Architecture

Core Design

Performance Features

Prerequisites

Quick Start

Windows Setup

Usage Examples

Basic Operations

Development Roadmap

Short-term Goals

Medium-term Goals

Long-term Vision

Technical Details

Memory Management

Shader Implementation

Contributing

Support

License

Acknowledgments

About

Releases

Packages

Languages

License

waefrebeorn/VulkanShaderCUDA

Folders and files

Latest commit

History

Repository files navigation

VulkanShaderCUDA

Current Status

✅ Working Features

🚧 Under Development

Architecture

Core Design

Performance Features

Prerequisites

Quick Start

Windows Setup

Usage Examples

Basic Operations

Development Roadmap

Short-term Goals

Medium-term Goals

Long-term Vision

Technical Details

Memory Management

Shader Implementation

Contributing

Support

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages