Skip to content
/ cpufp Public

A CPU tool for benchmarking the peak of floating points

License

Notifications You must be signed in to change notification settings

pigirons/cpufp

Repository files navigation

cpufp

This is a cpu tool for benchmarking the peak performance of floating-points and AI ISAs.

It can automatically sense the local SIMD|DSA ISAs while compiling.

Support OS and ISA

Arch Linux MacOS Windows
arm64 yes yes no
e2k yes no no
loongarch64 yes no no
riscv64 yes no no
x86-64 yes no no

Support x86-64 SIMD|DSA ISA

Arch ISA Feature Data Type Description
SIMD SSE Vector fp32 Before Sandy Bridge
SIMD SSE2 Vector fp64 Before Sandy Bridge
SIMD AVX Vector fp32/fp64 From Sandy Bridge
SIMD FMA Vector fp32/fp64 From Haswell/Zen
SIMD AVX512f Vector fp32/fp64 From Skylake X/Zen4
SIMD AVX512_VNNI Vector int8/int16 From IceLake
SIMD AVX_VNNI Vector int8/int16 From Alder Lake
SIMD AVX512_FP16 Vector fp16 From Intel Sapphire Rapids
SIMD AVX512_BF16 Vector bf16 From AMD Zen4
SIMD AVX_VNNI_INT8 Vector int8 Unknown
DSA AMX_INT8 Matrix int8 From Intel Sapphire Rapids
DSA AMX_BF16 Matrix bf16 From Intel Sapphire Rapids
DSA AMX_FP16 Matrix fp16 From Intel Granite Rapids

Support arm64 SIMD ISA

Arch ISA Feature Data Type Description
SIMD asimd Vector fp32/fp64 From Cortex-A57/A53
SIMD asimd_hp Vector fp16 From Cortex-A75/A55
SIMD asimd_dp Vector int8 From Cortex-A75/A55
SIMD bf16 Matrix bf16 From Cortex-X2/A710/A510
SIMD i8mm Matrix int8 From Cortex-X2/A710/A510

Support riscv64 VECTOR ISA

Arch ISA Feature Data Type Description
SIMD V Vector fp16/fp32/fp64 From RISC-V "V" vector extension. Version 1.0
DSA ime Matrix int8 From SpacemiT-X60

NOTE: ime is a SpacemiT custom vendor extension.

Support loongarch64 ISA

Arch ISA Feature Data Type Description
SIMD LASX Vector fp32/fp64 From Loongson 3A5000
SIMD LSX Vector fp32/fp64 From Loongson 3A5000
Scalar FP Scalar fp32/fp64 From Loongson 3A5000

Support e2k ISA

Arch ISA Feature Vector Width Data Type Description
SIMD v6 Vector 128 fp32/fp64 FMA
SIMD v5 Vector 128 fp32/fp64 Combined operations
Scalar v1-v4 Scalar fp64 Combined operations
SIMD v1-v4 Vector 64 fp32 Combined operations

Combined operations

E2K has support for instructions that perform two independant operations. It is like FMA, but with additional rounding as these operations is independant.

Example fmul_addd

fmul_addd src1, src2, src3, dst
Description

Multiply double-precision (64-bit) floating-point values from src1 and src2, and add the intermediate result to value from src3. Store the result in dst.

Operation
dst[63:0] := src3[63:0] + src1[63:0] * src2[63:0]
Latency and Throughput
Architecture Latency Throughput (CPI) ALC
elbrus-v4 8 0.16 012345
elbrus-v1 8 0.25 01-34-
  • ALC (Arithmetic Logic Complex/Channel) is an execution port for RISC-like instructions

How to build

build x64 version:

./build_x64.sh

build arm64 version:

./build_arm64.sh

build riscv64 version:

./build_riscv64.sh

build loongarch64 version:

./build_loongarch64.sh

build e2k version:

./build_e2k.sh

clean:

./clean.sh

How to benchmark

./cpufp --thread_pool=[xxx] --idle_time=yyy

--thread_pool: [xxx] is the list of cpu thread to benchmarking, from setting affinities. Please reference the result of lstopo command. For example, [0,3,5-8,13-15].

--idle_time: the interval time(sec) between any two adjacent benchmarks, default is 0.

Benchmark results

Arch Benchmark
x86-64 AMD Ryzen7 9700X
AMD Ryzen7 8845HS
AMD Ryzen9 6900HX
Intel Xeon Gold 6455B
Intel Xeon W9-3495X
Intel Core i5 1340P
Intel N100
arm64 Apple Silicon M4 Max
Apple Silicon M2 Max
Qualcomm Snapdragon X Elite X1E80100
AWS Graviton 3E
Broadcom BCM2712
Broadcom BCM2711
CIX CD8180
HUAWEI Kunpeng 920 7260
HUAWEI Kunpeng D920 2249K
Phytium D2000/8
RockChip RK3588
RockChip RK3399
riscv64 SpacemiT K1
Kendryte K230
loongarch64 Loongson 3A6000
Loongson 3C5000
Loongson 3A5000M
e2k Elbrus 8C2
Elbrus 8C
Elbrus 4C

Todo list

Add armv9(SVE, SVE2 & SME) Supports.

About

A CPU tool for benchmarking the peak of floating points

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published