Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfaults when using CUDA #1397

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aswild
Copy link

@aswild aswild commented Dec 9, 2024

Summary: switch from using xxd to bin2c when generating the .ptx.c files so that the PTX data can be null-terminated.

In newer drivers or cuda versions, vmaf now segfaults when trying to do anything from the GPU. The coredumps indicate that the crash happens somewhere inside the cuModuleLoadData calls in init_fex_cuda.

Documentation for cuModuleLoadData states that its image argument can be "obtained by mapping a cubin or PTX or fatbin file, [or] passing a cubin or PTX or fatbin file as a NULL-terminated text string...". It looks like VMAF is trying to do the latter, encoding PTX text files as an ASCII string using xxd, but there's no null-terminator in the data because nothing asked for one.

I'm a CUDA noob and don't know how this ever worked on older driver versions, but I tried editing the .ptx.c files by hand to add 0x00 bytes at the end and it worked!

Switch from xxd to bin2c (which is distributed with the cuda-nvcc package) that supports a --padd option to add a null byte to the PTX data, eliminating the segfaults. The arrays got renamed slightly to remove the src_ prefix, since bin2c doesn't do any automatic naming of the output array.

This should resolve #1357

Summary: switch from using xxd to bin2c when generating the .ptx.c files
so that the PTX data can be null-terminated.

In newer drivers or cuda versions, vmaf now segfaults when trying to do
anything from the GPU. The coredumps indicate that the crash happens
somewhere inside the cuModuleLoadData calls in init_fex_cuda.

Documentation for cuModuleLoadData states that its `image` argument can
be "obtained by mapping a cubin or PTX or fatbin file, [or] passing
a cubin or PTX or fatbin file as a NULL-terminated text string...". It
looks like VMAF is trying to do the latter, encoding PTX text files as
an ASCII string using xxd, but there's no null-terminator in the data
because nothing asked for one.

I'm a CUDA noob and don't know how this ever worked on older driver
versions, but I tried editing the .ptx.c files by hand to add 0x00 bytes
at the end and it worked!

Switch from xxd to bin2c (which is distributed with the cuda-nvcc
package) that supports a `--padd` option to add a null byte to the PTX
data, eliminating the segfaults. The arrays got renamed slightly to
remove the src_ prefix, since bin2c doesn't do any automatic naming of
the output array.
@nilfm99 nilfm99 requested a review from kylophone December 9, 2024 19:05
@nilfm99
Copy link
Collaborator

nilfm99 commented Dec 9, 2024

Thanks for the contribution! @kylophone is this something you could easily test?

@michalkielan
Copy link
Contributor

I have the same issues as described in #1357 using the latest Nvidia driver and CUDA and this fix is working for me. If testing is a blocker for this PR, I'm sharing the tests I've done to move this forward. On master both running vmaf_cuda using ffmpeg and vmaf's cuda unit tests are crashing due to CUDA_ERROR_INVALID_PTX and it with the fix there is no issue. @nilfm99 if you want to double-check something specific with ffmpeg + vmaf I can help to run it on my setup.

Tested this on: NVIDIA GeForce RTX 3060, Driver Version: 570.86.16, CUDA Version: 12.8
Without this fix (upstream master) the unit tests are failing:

$ meson libvmaf/build libvmaf -Denable_avx512=true -Denable_cuda=true
$ ninja -vC libvmaf/build test
...
1 frame  ⢀⠀ 0.00 FPScode: 218; description: CUDA_ERROR_INVALID_PTX
vmaf: ../src/feature/cuda/integer_adm_cuda.c:1026: init_fex_cuda: Assertion `0' failed.
/home/michalk/Projects/vmaf/libvmaf/tools/test/test_vmaf_cuda_gpumask.sh: line 10: 156665 Aborted                 (core dumped) ./tools/vmaf --reference /dev/zero --distorted /dev/zero --width 1920 --height 1080 --pixel_format 420 --bitdepth 8 --frame_cnt 2 --gpumask 0
...
test_cuda_no_init: pass
test_cuda_picture_preallocation_method_none: code: 218; description: CUDA_ERROR_INVALID_PTX
test_cuda_pic_preallocation: ../src/feature/cuda/integer_adm_cuda.c:1026: init_fex_cuda: Assertion `0' failed.
[1]    157225 IOT instruction (core dumped)  MSAN_OPTIONS= MALLOC_PERTURB_=221 ASAN_OPTIONS= MESON_TEST_ITERATION=1 =

With the fix (rebased to upstream master) the unit tests are passing:

...
 1/20 test_picture                        OK              0.04s
 2/20 test_feature_collector              OK              0.04s
 3/20 test_thread_pool                    OK              0.04s
 4/20 test_model                          OK              0.04s
 5/20 test_predict                        OK              0.03s
 6/20 test_dict                           OK              0.03s
 7/20 test_cpu                            OK              0.03s
 8/20 test_ref                            OK              0.02s
 9/20 test_feature                        OK              0.02s
10/20 test_ciede                          OK              0.02s
11/20 test_cambi                          OK              0.01s
12/20 test_luminance_tools                OK              0.01s
13/20 test_feature_extractor              OK              0.05s
14/20 test_cli_parse                      OK              0.01s
15/20 test_psnr                           OK              0.01s
16/20 test_propagate_metadata             OK              0.01s
17/20 test_ring_buffer                    OK              0.38s
18/20 test_vmaf_cuda_gpumask              OK              1.83s
19/20 test_cuda_pic_preallocation         OK              1.95s
20/20 test_framesync                      OK              5.01s

Ok:                 20
Expected Fail:      0
Fail:               0
Unexpected Pass:    0
Skipped:            0
Timeout:            0

ffmpeg:
Test videos generated by:

$ ffmpeg -f lavfi -i testsrc=duration=1:size=1280x720:rate=30 -pix_fmt yuv420p -c:v libx264 -preset fast -crf 23 reference.mkv
$ ffmpeg -hwaccel cuda -i reference.mkv -c:v hevc_nvenc -preset fast -qp 24 distorted.mkv

Without the fix ffmpeg is crashing at CUDA_ERROR_INVALID_PTX assert:

$ ffmpeg -hide_banner -hwaccel cuda -hwaccel_output_format cuda -i distorted.mkv -hwaccel cuda -hwaccel_output_format cuda -i reference.mkv -filter_complex "[0:v]scale_npp=format=yuv420p[dis];[1:v]scale_npp=format=yuv420p[ref];[dis][ref]libvmaf_cuda" -f null -
Input #0, matroska,webm, from 'distorted.mkv':
  Metadata:
    ENCODER         : Lavf61.7.100
  Duration: 00:00:01.00, start: 0.000000, bitrate: 265 kb/s
  Stream #0:0: Video: hevc (Main), yuv420p(tv, progressive), 1280x720 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn
      Metadata:
        ENCODER         : Lavc61.19.100 hevc_nvenc
        DURATION        : 00:00:01.000000000
Input #1, matroska,webm, from 'reference.mkv':
  Metadata:
    ENCODER         : Lavf61.7.100
  Duration: 00:00:01.00, start: 0.000000, bitrate: 203 kb/s
  Stream #1:0: Video: h264 (High), yuv420p(tv, progressive), 1280x720 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn
      Metadata:
        ENCODER         : Lavc61.19.100 libx264
        DURATION        : 00:00:01.000000000
Stream mapping:
  Stream #0:0 (hevc) -> scale_npp:default
  Stream #1:0 (h264) -> scale_npp:default
  libvmaf_cuda:default -> Stream #0:0 (wrapped_avframe)
Press [q] to stop, [?] for help
code: 218; description: CUDA_ERROR_INVALID_PTX
ffmpeg: ../src/feature/cuda/integer_adm_cuda.c:1026: init_fex_cuda: Assertion `0' failed.
[1]    209899 IOT instruction (core dumped)  ffmpeg -hide_banner -hwaccel cuda -hwaccel_output_format cuda -i distorted.mkv

With the fix.:

$ ffmpeg -hide_banner -hwaccel cuda -hwaccel_output_format cuda -i distorted.mkv -hwaccel cuda -hwaccel_output_format cuda -i reference.mkv -filter_complex "[0:v]scale_npp=format=yuv420p[dis];[1:v]scale_npp=format=yuv420p[ref];[dis][ref]libvmaf_cuda" -f null -
Input #0, matroska,webm, from 'distorted.mkv':
  Metadata:
    ENCODER         : Lavf61.7.100
  Duration: 00:00:01.00, start: 0.000000, bitrate: 265 kb/s
  Stream #0:0: Video: hevc (Main), yuv420p(tv, progressive), 1280x720 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn
      Metadata:
        ENCODER         : Lavc61.19.100 hevc_nvenc
        DURATION        : 00:00:01.000000000
Input #1, matroska,webm, from 'reference.mkv':
  Metadata:
    ENCODER         : Lavf61.7.100
  Duration: 00:00:01.00, start: 0.000000, bitrate: 203 kb/s
  Stream #1:0: Video: h264 (High), yuv420p(tv, progressive), 1280x720 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn
      Metadata:
        ENCODER         : Lavc61.19.100 libx264
        DURATION        : 00:00:01.000000000
Stream mapping:
  Stream #0:0 (hevc) -> scale_npp:default
  Stream #1:0 (h264) -> scale_npp:default
  libvmaf_cuda:default -> Stream #0:0 (wrapped_avframe)
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf61.7.100
  Stream #0:0: Video: wrapped_avframe, cuda(tv, progressive), 1280x720 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn
      Metadata:
        encoder         : Lavc61.19.100 wrapped_avframe
[Parsed_libvmaf_cuda_2 @ 0x7335a8007280] VMAF score: 97.512795
[out#0/null @ 0x59af9bc4ea80] video:13KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: unknown
frame=   30 fps=0.0 q=-0.0 Lsize=N/A time=00:00:01.00 bitrate=N/A speed=7.08x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libvmaf cuda - init_fex_cuda: Assertion `0' failed.
3 participants