Skip to content
View YconquestY's full-sized avatar

Block or report YconquestY

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A repository to unravel the language of GPUs, making their kernel conversations easy to understand

Python 131 4 Updated Mar 6, 2025

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…

Python 13,274 894 Updated Feb 27, 2025

Where GPUs get cooked 👩‍🍳🔥

Rust 188 8 Updated Mar 4, 2025

The P programming language.

C# 3,190 189 Updated Mar 7, 2025

A lightweight data processing framework built on DuckDB and 3FS.

Python 3,860 315 Updated Mar 5, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 7,584 655 Updated Mar 7, 2025

Analyze computation-communication overlap in V3/R1.

888 113 Updated Mar 3, 2025

Expert Parallelism Load Balancer

Python 1,021 147 Updated Feb 27, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,512 241 Updated Mar 5, 2025

rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.

C++ 60 12 Updated Mar 6, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,806 466 Updated Mar 5, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,039 604 Updated Mar 6, 2025

A list of companies of possible interest for mathematicians (or related) that are looking for a job in quantitative finance in Zurich.

187 9 Updated Jan 26, 2024

FlashMLA: Efficient MLA decoding kernels

C++ 11,176 777 Updated Mar 1, 2025

Natural Number Game

Lean 150 41 Updated Feb 17, 2025

🎨 Refly is an open-source AI-native creation engine. Its intuitive free-form canvas interface combines multi-threaded dialogues, AI knowledge base integration, chrome extension clip & save, context…

TypeScript 1,891 150 Updated Mar 7, 2025

nanobind: tiny and efficient C++/Python bindings

C++ 2,626 220 Updated Mar 2, 2025

Lean 4 programming language and theorem prover

Lean 5,158 471 Updated Mar 7, 2025

Various HDL (Verilog) IP Cores

Verilog 747 218 Updated Jul 1, 2021

Source code examples from the Parallel Forall Blog

HTML 1,266 638 Updated Jul 23, 2024

A lightweight (3 file, single function) library for running micro-benchmarks on C++ code

C++ 79 14 Updated Dec 16, 2015

C++14 lock-free queue.

C++ 1,605 183 Updated Feb 10, 2025

A fast single-producer, single-consumer lock-free queue for C++

C++ 3,978 680 Updated Jul 9, 2024

Repo for counting stars and contributing. Press F to pay respect to glorious developers.

270,373 21,118 Updated Oct 3, 2024

Reproduce R1 Zero on Logic Puzzle

Python 2,029 128 Updated Mar 3, 2025

AXI, AXI stream, Ethernet, and PCIe components in System Verilog

SystemVerilog 98 15 Updated Mar 1, 2025

User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)

TypeScript 32,931 3,124 Updated Mar 4, 2025

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,001 1,404 Updated Feb 1, 2025

这是一个简单的技术科普教程项目,主要聚焦于解释一些有趣的,前沿的技术概念和原理。每篇文章都力求在 5 分钟内阅读完成。

1,557 61 Updated Mar 1, 2025
Python 1,300 99 Updated Feb 15, 2025
Next
Showing results