Skip to content

Releases: AmusementClub/vs-dfttest2

v7: update libraries

27 Apr 07:56
Compare
Choose a tag to compare
  • Upgraded to CUDA 12.4.1.
  • Added support for HIP for AMD GPUs. No precompiled binaries yet.

v6: bug fixes

19 Aug 01:23
Compare
Choose a tag to compare
  • Fix missing headers (#6). Thanks @shssoichiro.
  • Fix dangling pointer during filter destruction.

v5: CPU dispatching, generic gcc backend

05 Oct 14:11
Compare
Choose a tag to compare
  • The CPU backend implements cpu dispatching through the opt parameter as in VapourSynth-dfttest.

  • Added GCC backend based on gcc vector extension.

    Benchmark on ARM64 cpus.

Known issue

  • g++ 11 may hang when building the GCC backend for arm64 target.

v4: CPU backend (for x86_64)

31 Aug 10:27
Compare
Choose a tag to compare

The CPU backend only supports sbsize=16 and tbsize in [1, 3, 5, 7], because its implementation is based on the NVRTC backend (without run-time compilation).

The binary is compiled for avx2 in CI and can actually be compiled for any of the x86_64 target (e.g. sse2, avx, avx2, avx-512).

It is not recommended to built with the latest version of the vector class library (2.02.00) since it zero-initializes vectors on stack, which degrades performance.

Benchmark

Known issue

  • The binary is only compiled for avx2 currently. This should be improved in the future.

v3: optimization for temporal denoising

19 Aug 09:46
Compare
Choose a tag to compare
  • Improved performance for temporal denoising of the NVRTC backend (~1.2x), especially for tbsize=7 (~2x). benchmark
  • Automatic backend selection is implemented. It will fail if the suggested backend is not found.
    • The selection is based on internal heuristics and may vary from one release to another.

Full Changelog: v2...v3

v2: nvrtc backend

02 Aug 11:08
Compare
Choose a tag to compare

The dfttest2.Backend.NVRTC backend is intended to be specialized for some popular parameter groups. It delivers up to 5x performance and noticeable memory usage reduction over existing cuFFT backend (benchmark).

Currently it supports sbsize=16, tbsize=1 / 3 / 5 / 7. The cuFFT backend should be used for other cases.

Please create an issue for other values of interest.

dfttest2.DFTTest and dfttest2.Backend are the only stable interfaces at present.

Usage

For the cuFFT backend, the cufft64_10.dll should be placed in a folder named vsmlrt-cuda next to the plugin's dll (just extract cufft-windows-VER.7z to the same directory as the plugin dll), or any locations specified in PATH.

The NVRTC backend has no external dependencies.

Known issues

  • The binaries should be fused into a single package.
  • Automatic backend selection is not implemented yet. It will be available in the next release, and the default value of backend will be set to None accordingly.
  • tbsize=7 is not well optimized. This may not be addressed because of the unmatched code architecture.

Full Changelog: v1...v2

v1: Initial release

23 Jul 01:44
Compare
Choose a tag to compare

Usage

from dfttest2 import DFTTest

output = DFTTest(input)

The Python wrapper dfttest2.DFTTest() is the only stable interface and generally matches the original DFTTest plugin interface.
The new dfttest2.DFTTest2() interface is still a work-in-progress. The dfttest2.Backend interface may be changed in the future.

Performance is mostly limited by the memory bandwidth of the gpu. Expecting ~2x performance on conventional hardware (i.e. with GDDR memory rather than HBM memory).

Specifying backend=Backend.cuFFT(in_place=False) to use more efficient kernels at the cost of increased (1.15x ~ 1.30x) device memory usage.

Benchmark for this release.

Known issues

  • nlocation is not implemented yet
  • other backends