Skip to content

qlibs/perf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

#pragma GCC system_header // Overview / Examples / API / FAQ / Resources

perf: C++23 Performance library

License Version Build Try it online

Performance is not a number!

Overview

Single header/module performance library that combines the power of:
c++23, linux/perf/intel_pt, llvm/mca, gnuplot/sixel, ...

Features

Benchmarking, Tuning, Profiling, Tracing, Analyzing
name description
perf::info::* compiler, cpu, memory, sys, proc, bin
perf::core::* code, compiler, cpu, memory
perf::time::timer steady_clock, cpu, thread, real, monotonic, tsc
perf::stat::counter perf stat -e
perf::stat::sampler perf record -e / perf mem record
perf::trace::tracer perf record -e intel_pt/
perf::mc::disassembler assembly, address, encoding, size, uops, latency, rthroughput, may_load, may_store, has_side_effects, branch::*, source
perf::mca::analyzer timeline, resource_pressure, bottleneck
perf::runner baseline, latency, throughput
perf::plot::* hist, box, bar, line, ecdf
perf::* log, json, report, annotate

Requirements

Minimal
Optimal (recommended)

Dockerfile

Optional
  • gh - apt-get install gh
  • prof - https://github.com/qlibs/prof
  • uefi - https://github.com/qlibs/uefi
  • ut - https://github.com/qlibs/ut

Dockerfile

Examples

Info/Core
Profiling
Tracing
Analyzing
Plotting
Benchmarking
Miscellaneous

API

Configuration
/**
 * PERF version # https://semver.org
 */
#define PERF (MAJOR, MINOR, PATCH) // ex. (1, 0, 0)

/**
 * GNU # default: deduced based on `__GNUC__`
 * - 0 not compatible
 * - 1 compatible
 */
#define PERF_GNU 0/1

/**
 * Linux # default: deduced based on `__linux__` and `perf_event_open.h`
 * - 0 not supported
 * - 1 supported
 */
#define PERF_LINUX 0/1

/**
 * LLVM # default: deduced based on `llvm-dev` headers
 * - 0 not supported
 * - 1 supported
 */
#define PERF_LLVM 0/1

/**
 * Intel Processor Trace # default: deduced based on `<intel_pt.h>` header
 * - 0 not supported
 * - 1 supported
 */
#define PERF_INTEL 0/1

/**
 * Output support # default: 1
 * - 0 no output support compiled in
 * - 1 output supported (`log, json, report, annotate, plot`)
 */
#define PERF_OUT 0/1

/**
 * tests # default: not-defined
 * - defined:     disables all compile-time, run-time tests
 * - not-defined: compile-time tests executed,
 *                run-time tests available by `test::run()` API
 */
#define NTEST
/**
 * gnuplot terminal # see `gnuplot -> set terminal` # default: 'sixel'
 * - 'sixel'                  # console image # https://www.arewesixelyet.com
 * - 'wxt'                    # popup window
 * - 'dumb size 150,25 ansi'  # console with colors
 * - 'dumb size 80,25'        # console
 */
ENV:PERF_PLOT_TERM

/**
 * style # default: dark
 * - light
 * - dark
 */
ENV:PERF_PLOT_STYLE
Synopsis

FAQ

Setup
  • How to setup docker?

    Dockerfile

    docker build -t perf .
    docker run \
      -it \
      --privileged \
      --network=host \
      -e DISPLAY=${DISPLAY} \
      -v ${PWD}:${PWD} \
      -w ${PWD} \
      perf
  • How to build depenencies?

    apt-get install linux-tools-common
    apt-get install llvm-dev
    apt-get install libipt-dev
    apt-get install gnuplot
  • How to setup linux performance counters?

    setup.sh

    .github/scripts/setup.sh --perf # --rdpmc --max-sample-rate 10000

    linux

    sudo mount -o remount,mode=755 /sys/kernel/debug
    sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
    sudo chown `whoami` /sys/kernel/debug/tracing/uprobe_events
    sudo chmod a+rw /sys/kernel/debug/tracing/uprobe_events
    echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
    echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
    echo 1000 | sudo tee /proc/sys/kernel/perf_event_max_sample_rate
    echo 2 | sudo tee /sys/devices/cpu_core/rdpmc
  • How to find out which performance events are supported by the cpu?

    https://perfmon-events.intel.com

    perf list
  • How to reduce execution variaility?

    tune.sh

    .github/scripts/tune.sh

    pyperf

    pip3 install pyperf
    sudo pyperf system tune
    sudo pyperf system show
    sudo pyperf system reset

    linux

    # Set Process CPU Affinity (apt install util-linux)
    taskset -c 0 ./a.out
    
    # Set Process Scheduling Priority (apt install coreutils)
    nice -n -20 taskset -c 0 ./a.out # -20..19 (most..less favorable to the process)
    
    # Disable CPU Frequency Scaling (apt install cpufrequtils)
    sudo cpupower frequency-set --governor performance
    # cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
    # Disable Address Space Randomization
    echo 0 > /proc/sys/kernel/randomize_va_space
    
    # Disable Processor Boosting
    echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost
    
    # Disable Turbo Mode
    echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
    
    # Disable Hyperthreading/SMT
    echo off | sudo tee /sys/devices/system/cpu/smt/control
    
    # Restrict memory to a single socket
    numactl -m 0 -N 0 ./a.out
    
    # Enable Huge Pages
    sudo numactl --cpunodebind=1 --membind=1 hugeadm \
      --obey-mempolicy --pool-pages-min=1G:64
    sudo hugeadm --create-mounts

    boot / grub

    # Enable Kernel Mode Task-Isolation (https://lwn.net/Articles/816298)
    isolcpus=<cpu number>,...,<cpu number> # cat /sys/devices/system/cpu/isolated
    
    # Disable P-states and C-states
    idle=pool intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=1 # cat /sys/devices/system/cpu/intel_pstate/status
    
    # Disable NMI watchdog
    nmi_watchdog=0 # cat /proc/sys/kernel/nmi_watchdog

    https://github.com/qlibs/uefi

Usage
  • How to use perf with modules?

    clang++-20 -std=c++23 -O3 -I. --precompile perf.cppm
    clang++-20 -std=c++23 -O3 -fprebuilt-module-path=. perf.pcm *.cpp -lLLVM-18 -lipt
  • What is prevent_elision and when it`s needed?

    Optimizing compiler may elide code completely if its considered as not needed - doesnt have side effects. prevent_elision enforces that the code wont be elided by the compiler

    verify(perf::compiler::is_elided([] { }));
    verify(perf::compiler::is_elided([] { auto value = 4 + 2; }));
    verify(perf::compiler::is_elided([] { int i{}; i++; }));
    verify(not perf::compiler::is_elided([&] { i++; }));
    verify(not perf::compiler::is_elided([] { static int i; i++; }));
    verify(not perf::compiler::is_elided([=] {
      int i{};
      perf::compiler::prevent_elision(i++);
    }));
  • How to change assembly syntax?

    perf::llvm llvm{
      {.syntax = perf::arch::syntax::att} // default: intel
    };
  • How to disassemble for a different platform?

    perf::llvm llvm{
      .triple = "x86_64-pc-linux-gnu" // see `llvm-llc` for details
    };
  • How to write custom profiler?

    struct profiler {
      constexpr auto start();
      constexpr auto stop();
      [[nodiscard]] constexpr auto *operator() const;
    };
    static_assert(perf::profiler_like<profiler>);
  • How to integrate with unit-testing framework?

    import perf;
    import ut; // https://github.com/qlibs/ut
    
    int main() {
      "benchmark"_test = [] {
        // ...
      };
    }
  • Which terminal can display images?

    Any terminal with sixel support - https://www.arewesixelyet.com

  • How to plot on the server without sixel?

    PERF_PLOT_TERM='dumb' ./a.out
    PERF_PLOT_TERM='dumb size 80,25' ./a.out
    PERF_PLOT_TERM='dumb size 150,25 ansi' ./a.out
  • How to plot using windows-based charts?

    PERF_PLOT_TERM='wxt' ./a.out
  • How to change plots style?

    PERF_PLOT_STYLE='dark' ./perf # default
    PERF_PLOT_STYLE='light' ./perf
  • How to save plots?

    perf::plot::gnuplot plt{{.term = "png"}};
    plt.send("set output 'output.png'");
    perf::plot::bar(plt, ...);
  • How to export results?

    export.sh

    ./a.out 2>&1 | .github/scripts/export.sh markdown > results.md
    ./a.out 2>&1 | .github/scripts/export.sh notebook > results.ipynb
    ./a.out 2>&1 | .github/scripts/export.sh html > results.html
  • How to share results?

    gh - apt-get install gh

    # https://jbt.github.io/markdown-editor
    gh gist create --public --web results.md
    # https://jupyter.org
    gh gist create --public --web results.ipynb
    # https://htmlpreview.github.io
    gh gist create --public --web results.html
  • How perf tests are working?

    compile-time tests are executed upon include/import (enabled by default)
    run-time/sanity check tests can be executed at run-time

    int main() {
      perf::test::run({.verbose = true});
    }

    Tests can be disabled with -DNTEST (not recommended)

    $CXX -DNTEST ... # tests will NOT be compiled in
    #ifndef NTEST
    "perf"_suite = [] {
      "run-time and compile-time"_test = [] constexpr {
        expect(3 == accumulate({1, 2, 3}, 0));
      };
    
      "run-time only"_test = [] mutable {
        expect(std::rand() >= 0);
      };
    
      "compile-time"_test = [] consteval {
        expect(sizeof(int) == sizeof(0));
      };
    };
    #endif
Performance
  • Latency vs. Throughput?

    Latency is the time it takes for a single operation to complete (ns)
    Throughput is the total number of operations or tasks completed in a given amount of time (op/s)

  • What are performance compilation flags?

    -O1                     # optimizations (O1) [0]
    -O2                     # optimizations (O1 + O2) [0]
    -O3                     # [unsafe] optimizations (O1 + O2 + O3) [0]
    -DNDEBUG                # disables asserts, etc.
    -march=native           # defines architecture [1]
    -ffast-math             # [unsafe] faster but non-conforming math [2]
    -g                      # debug symbols
    -fno-omit-frame-pointer # always keep the frame pointer in a register
    -fcf-protection=none    # [unsafe] stops emmitting `endbr64`

    [0] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
    [1] https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
    [2] https://gcc.gnu.org/wiki/FloatingPointMath

  • What are performance compiler attributes?

    // target
    [[gnu::target("avx2")]]
    [[gnu::target("bmi")]]
    
    // optimize
    [[gnu::optimize("O3")]
    [[gnu::optimize("ffast-math")]

    https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html

  • What is Top-Down Microarchitecture Analysis method?

    https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2023-0/top-down-microarchitecture-analysis-method.html
    https://github.com/andikleen/pmu-tools/wiki/toplev-manual

  • What is Jupyter Notebook?

    Jupter Notebook can be used for data analysis (python)

    # apt install jupyter
    jupyter notebook -ip 0.0.0.0 --no-browser notebook.ipynb
  • What are different benchmarking workflows?

    Dev (cpp -> ./a.out)
      (+) easy editing and integration with exising tooling
      (+) output to the console (plots with sixel)
      (+) easy to share via gist/markdown
      (+) can assert and verify expectations
      (+) can be executed in headless mode on the server without UI
      (-) harder analysis than in python
      (-) single compiler workflow

    Research (cpp -> ./a.out (json) -> notebook/python)
      (+) powerful data analysis (python)
      (+) can run different compilers/options
      (+) easy to share via jupyter notebook
      (+) can be used on kaggle/google colab
      (-) cpp is not well supported by notebooks but building/running works fine
      (~) usually requires running through a browser
      (-) not suited for assembly analysis
      (-) might be slow with a lot of data

  • How to avoid benchmarking pitfalls?

    WHERE (cpu, os effects)
    WHAT (latency vs. throughput)
    HOW (statistical, realistic, structured/top-down, visual)
    WHY (understanding, analysis, verfication, sharing)

Resources

Specs
Feeds
Videos
Tools

License

MIT / Apache2:LLVM*