Releases: AmusementClub/vs-dfttest2
v7: update libraries
- Upgraded to CUDA 12.4.1.
- Added support for HIP for AMD GPUs. No precompiled binaries yet.
v6: bug fixes
- Fix missing headers (#6). Thanks @shssoichiro.
- Fix dangling pointer during filter destruction.
v5: CPU dispatching, generic gcc backend
-
The
CPU
backend implements cpu dispatching through theopt
parameter as in VapourSynth-dfttest. -
Added
GCC
backend based on gcc vector extension.Benchmark on ARM64 cpus.
Known issue
- g++ 11 may hang when building the
GCC
backend for arm64 target.
v4: CPU backend (for x86_64)
The CPU
backend only supports sbsize=16
and tbsize
in [1, 3, 5, 7], because its implementation is based on the NVRTC
backend (without run-time compilation).
The binary is compiled for avx2 in CI and can actually be compiled for any of the x86_64 target (e.g. sse2, avx, avx2, avx-512).
It is not recommended to built with the latest version of the vector class library (2.02.00) since it zero-initializes vectors on stack, which degrades performance.
Known issue
- The binary is only compiled for avx2 currently. This should be improved in the future.
v3: optimization for temporal denoising
- Improved performance for temporal denoising of the
NVRTC
backend (~1.2x), especially fortbsize=7
(~2x). benchmark - Automatic backend selection is implemented. It will fail if the suggested backend is not found.
- The selection is based on internal heuristics and may vary from one release to another.
Full Changelog: v2...v3
v2: nvrtc backend
The dfttest2.Backend.NVRTC
backend is intended to be specialized for some popular parameter groups. It delivers up to 5x performance and noticeable memory usage reduction over existing cuFFT
backend (benchmark).
Currently it supports sbsize=16, tbsize=1 / 3 / 5 / 7
. The cuFFT
backend should be used for other cases.
Please create an issue for other values of interest.
dfttest2.DFTTest
and dfttest2.Backend
are the only stable interfaces at present.
Usage
For the cuFFT
backend, the cufft64_10.dll
should be placed in a folder named vsmlrt-cuda
next to the plugin's dll (just extract cufft-windows-VER.7z
to the same directory as the plugin dll), or any locations specified in PATH
.
The NVRTC
backend has no external dependencies.
Known issues
- The binaries should be fused into a single package.
- Automatic backend selection is not implemented yet. It will be available in the next release, and the default value of
backend
will be set toNone
accordingly. tbsize=7
is not well optimized. This may not be addressed because of the unmatched code architecture.
Full Changelog: v1...v2
v1: Initial release
Usage
from dfttest2 import DFTTest
output = DFTTest(input)
The Python wrapper dfttest2.DFTTest()
is the only stable interface and generally matches the original DFTTest plugin interface.
The new dfttest2.DFTTest2()
interface is still a work-in-progress. The dfttest2.Backend
interface may be changed in the future.
Performance is mostly limited by the memory bandwidth of the gpu. Expecting ~2x performance on conventional hardware (i.e. with GDDR memory rather than HBM memory).
Specifying backend=Backend.cuFFT(in_place=False)
to use more efficient kernels at the cost of increased (1.15x ~ 1.30x) device memory usage.
Benchmark for this release.
Known issues
nlocation
is not implemented yet- other backends