Note that this repository is under active development.
Section | Videos | Codes |
---|---|---|
01 | 第1集 基于CuPy的CUDA跨平台开发环境配置 | course01_hello_cuda |
- ...
- ...
Thanks for the following excellent public learning resources.
-
codingonion/awesome-cuda-tensorrt-fpga
: A collection of some awesome public NVIDIA CUDA, TensorRT, AMD ROCm and FPGA projects.
-
codingonion/cuda-beginner-course-cpp-version
: bilibili视频【CUDA 12.1 并行编程入门(C++语言版)】配套代码。
-
codingonion/cuda-beginner-course-rust-version
: bilibili视频【CUDA 12.1 并行编程入门(Rust语言版)】配套代码。
-
codingonion/cuda-beginner-course-python-version
: bilibili视频【CUDA 12.1 并行编程入门(Python语言版)】配套代码。
-
NVIDIA CUDA Docs : CUDA Toolkit Documentation.
-
NVIDIA/cuda-samples
: Samples for CUDA Developers which demonstrates features in CUDA Toolkit.
-
NVIDIA/CUDALibrarySamples
: CUDA Library Samples.
-
HeKun-NVIDIA/CUDA-Programming-Guide-in-Chinese
: This is a Chinese translation of the CUDA programming guide. 本项目为 CUDA C Programming Guide 的中文翻译版。
-
brucefan1983/CUDA-Programming
: Sample codes for my CUDA programming book.
-
YouQixiaowu/CUDA-Programming-with-Python
: 关于书籍CUDA Programming使用了pycuda模块的Python版本的示例代码。
-
QINZHAOYU/CudaSteps
: 基于《cuda编程-基础与实践》(樊哲勇 著)的cuda学习之路。
-
sangyc10/CUDA-code
: B站视频教程【CUDA编程基础入门系列(持续更新)】配套代码。
-
RussWong/CUDATutorial
: A CUDA tutorial to make people learn CUDA program from 0.
-
DefTruth/cuda-learn-note
: 🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
-
Liu-xiandong/How_to_optimize_in_GPU
: This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
-
enp1s0/ozIMMU
: FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme. arxiv.org/abs/2306.11975
-
Bruce-Lee-LY/matrix_multiply
: Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
-
Bruce-Lee-LY/cuda_hgemm
: Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
-
Bruce-Lee-LY/cuda_hgemv
: Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
-
Cjkkkk/CUDA_gemm
: A simple high performance CUDA GEMM implementation.
-
AyakaGEMM/Hands-on-GEMM
: A GEMM tutorial.
-
zpzim/MSplitGEMM
: Large matrix multiplication in CUDA.
-
jundaf2/CUDA-INT8-GEMM
: CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API.
-
chanzhennan/cuda_gemm_benchmark
: Base on gtest/benchmark, refer to https://github.com/Liu-xiandong/How_to_optimize_in_GPU.
-
YuxueYang1204/CudaDemo
: Implement custom operators in PyTorch with cuda/c++.
-
CoffeeBeforeArch/cuda_programming
: Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch.
-
rbaygildin/learn-gpgpu
: Algorithms implemented in CUDA + resources about GPGPU.
-
PacktPublishing/Learn-CUDA-Programming
: Learn CUDA Programming, published by Packt.
-
PacktPublishing/Hands-On-GPU-Accelerated-Computer-Vision-with-OpenCV-and-CUDA
: Hands-On GPU Accelerated Computer Vision with OpenCV and CUDA, published by Packt.
-
PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA
: Hands-On GPU Programming with Python and CUDA, published by Packt.