fftconv

Extremely fast CPU 1D convolutions. Faster than Intel IPP and Apple Accelerate on their respective platforms

Kernel size = 245

It's well know that convolution in the time domain is equivalent to multiplication in the frequency domain (circular convolution). With the Fast Fourier Transform, we can reduce the time complexity of a discrete convolution from O(n^2) to O(n log(n)), where n is the larger of the two array sizes. The overlap-add method is a fast convolution method commonly use in FIR filtering, where the discrete signal is often much longer than the FIR filter kernel.

Usage

Check this repo to see how to use fftconv as a custom port through VCPKG.

fftconv::convolve_fftw implements FFT convolution.
fftconv::oaconvolve_fftw implements FFT convolution using the overlap-add method, much faster when one sequence is much longer than the other (e.g. in FIR filtering).

All convolution functions support float and double and use a C++20 std::span interface.

template <FloatOrDouble Real>
void oaconvolve_fftw(const std::span<const Real> arr,
                     const std::span<const Real> kernel, std::span<Real> res);

Python bindings are provided through Cython.

Build the test and benchmark

This benchmark is out of date. Check this repo for the up-to-date benchmarks.

The only dependency of fftconv is fftw3. Since the float and double interface of fftw3 are used, link with -lfftw -lfftwf.

Benchmark and test dependencies:

fftw3
armadillo (benchmarked against as a baseline)
google-benchmark used for benchmarking.
gperftools used for profiling.

Python

TODO The Python wrapper is currently out of date.

A Cython wrapper is provided. Dependencies:

Cython for C++ bindings
numpy (benchmarked against)
numba (benchmarked against)
scipy (benchmarked against)
matplotlib (plot results)

python3 setup.py build_ext -i
python3 test.py # run the python test/benchmark

Benchmark results

CPU: Intel i7 Comet Lake

C++.

The test_fftconv binary gives an easy benchmark that runs every test case 5000 times. The bench_fftconv uses google-benchmark and gives much more reliable measures. Use ./script/run_bench to run the benchmark and generate figures.

Output from bench_fftconv (accurate bench) raw result saved in ./bench_result.json. Plot generated from plot_bench.py:

Output from test_fftconv (simple bench)

% ./build/test_fftconv
=== test case (1664, 65) ===
All tests passed.
    (5000 runs) convolve_fftw took 82ms
    (5000 runs) oaconvolve_fftw took 36ms
    (5000 runs) convolve_pocketfft took 91ms
    (5000 runs) oaconvolve_pocketfft took 70ms
    (5000 runs) convolve_pocketfft_hdr took 111ms
    (5000 runs) oaconvolve_pocketfft_hdr took 105ms
    (5000 runs) convolve_armadillo took 108ms
=== test case (2816, 65) ===
All tests passed.
    (5000 runs) convolve_fftw took 111ms
    (5000 runs) oaconvolve_fftw took 60ms
    (5000 runs) convolve_pocketfft took 157ms
    (5000 runs) oaconvolve_pocketfft took 115ms
    (5000 runs) convolve_pocketfft_hdr took 187ms
    (5000 runs) oaconvolve_pocketfft_hdr took 166ms
    (5000 runs) convolve_armadillo took 174ms
=== test case (2304, 65) ===
All tests passed.
    (5000 runs) convolve_fftw took 536ms
    (5000 runs) oaconvolve_fftw took 52ms
    (5000 runs) convolve_pocketfft took 175ms
    (5000 runs) oaconvolve_pocketfft took 98ms
    (5000 runs) convolve_pocketfft_hdr took 206ms
    (5000 runs) oaconvolve_pocketfft_hdr took 143ms
    (5000 runs) convolve_armadillo took 147ms
=== test case (4352, 65) ===
All tests passed.
    (5000 runs) convolve_fftw took 335ms
    (5000 runs) oaconvolve_fftw took 86ms
    (5000 runs) convolve_pocketfft took 319ms
    (5000 runs) oaconvolve_pocketfft took 165ms
    (5000 runs) convolve_pocketfft_hdr took 369ms
    (5000 runs) oaconvolve_pocketfft_hdr took 235ms
    (5000 runs) convolve_armadillo took 276ms

Python.

% python3 test.py
=== test case (1664, 65) ===
Vectors are equal.
    (5000 runs) convolve_fftw took 73ms
    (5000 runs) convolve_pocketfft took 70ms
    (5000 runs) oaconvolve_fftw took 38ms
    (5000 runs) oaconvolve_pocketfft took 53ms
    (5000 runs) np.convolve took 140ms
    (5000 runs) numba.njit(np.convolve) took 1409ms
    (5000 runs) scipy.signal.convolve took 162ms
    (5000 runs) scipy.signal.fftconvolve took 199ms
    (5000 runs) scipy.signal.oaconvolve took 321ms
=== test case (2816, 65) ===
Vectors are equal.
    (5000 runs) convolve_fftw took 96ms
    (5000 runs) convolve_pocketfft took 110ms
    (5000 runs) oaconvolve_fftw took 60ms
    (5000 runs) oaconvolve_pocketfft took 84ms
    (5000 runs) np.convolve took 236ms
    (5000 runs) numba.njit(np.convolve) took 2883ms
    (5000 runs) scipy.signal.convolve took 256ms
    (5000 runs) scipy.signal.fftconvolve took 256ms
    (5000 runs) scipy.signal.oaconvolve took 362ms
=== test case (2304, 65) ===
Vectors are equal.
    (5000 runs) convolve_fftw took 281ms
    (5000 runs) convolve_pocketfft took 132ms
    (5000 runs) oaconvolve_fftw took 53ms
    (5000 runs) oaconvolve_pocketfft took 75ms
    (5000 runs) np.convolve took 194ms
    (5000 runs) numba.njit(np.convolve) took 2215ms
    (5000 runs) scipy.signal.convolve took 213ms
    (5000 runs) scipy.signal.fftconvolve took 240ms
    (5000 runs) scipy.signal.oaconvolve took 346ms
=== test case (4352, 65) ===
Vectors are equal.
    (5000 runs) convolve_fftw took 326ms
    (5000 runs) convolve_pocketfft took 215ms
    (5000 runs) oaconvolve_fftw took 82ms
    (5000 runs) oaconvolve_pocketfft took 117ms
    (5000 runs) np.convolve took 358ms
    (5000 runs) numba.njit(np.convolve) took 3657ms
    (5000 runs) scipy.signal.convolve took 378ms
    (5000 runs) scipy.signal.fftconvolve took 365ms
    (5000 runs) scipy.signal.oaconvolve took 395ms

The Python wrapper is almost as fast as the C++ code, as it has very little overhead.

Implementation Details

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.vscode		.vscode
bench_results		bench_results
benchmark		benchmark
fftconv_fftw		fftconv_fftw
fftconv_pocketfft		fftconv_pocketfft
ports/kfr		ports/kfr
pyfftconv		pyfftconv
scripts		scripts
test		test
tools		tools
.clang-tidy		.clang-tidy
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
LICENSE		LICENSE
README.md		README.md
bench_2022-08-21T23-11-01.svg		bench_2022-08-21T23-11-01.svg
bench_2024-02-23T13-11-49.svg		bench_2024-02-23T13-11-49.svg
cost_func.py		cost_func.py
debug_utils.hpp		debug_utils.hpp
meson.build		meson.build
optimal_fft_size.csv		optimal_fft_size.csv
plot_bench.py		plot_bench.py
plot_bench_pocketfft.py		plot_bench_pocketfft.py
setup.py		setup.py
test.py		test.py
vcpkg-configuration.json		vcpkg-configuration.json
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fftconv

Usage

Build the test and benchmark

Benchmark results

Implementation Details

About

Releases

Packages

Languages

License

kwsp/fftconv

Folders and files

Latest commit

History

Repository files navigation

fftconv

Usage

Build the test and benchmark

Benchmark results

Implementation Details

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages