Skip to content

BlinkDL/fast.c

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fast.c

Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.

Current optimization target:

CPU = AMD 7700 (Zen 4)

DRAM = DDR5-6000 dual channel

SSD = 4 x PCIE5.0x4

Current results:

NVME raw speed = 53+ GB/s

DRAM processing = 60.8 GB/s

SSD processing = 29.4 GB/s (huge room for improvements, such as ovelapping io & compute, raw fs)

GEMV int8/32 = 75 ~ 147 GB/s (?!)

About

Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages