Skip to content

C-style API/Python bindings not working on Windows #429

Open
@Letorshillen

Description

@Letorshillen

The c-style api/python bindings are not working on windows. On Linux/MacOS/WSL the code snippets from issue #9 are working without any trouble. But on windows the adjusted code is just shut down as you can see in the comment below.

For some reason none of these solutions seem to work now on Windows (both 10 and 11). Every time on multiple machines the Python interpreter exits and shuts down without any error during the invocation of whisper_full. Does anyone have any ideas?

So far I've used both my cython extension and the various ctypes (using updated versions of the examples provided here). I've even made sure the compiler is MSVC, and it matches the version used by Python - I am using Python 3.10. With thecython extension I've tried linking against the dynamic library, static library and also tried just object files. Compiled both using the CMake scripts and an updated Makefile.

This only happens on Windows and the exact same code works fine on MacOS and Linux. Without windows compatibility this extension becomes useless for its original purpose. If anyone could help I would greatly appreciate it.

The only clue I've got is that; one time when using a (Debug) DLL created by VS 2020 (with a more recent compiler than python3.10!) with a solution from CMake, a complaint was given about a rv==0 assert presumably this is a threading issue possibly not using pthreads however all the default CMake options were used to create the solution file. Nothing was edited. Just the latest version was built straight. pthreads should have been used. This was using the latest ctypes solution mentioned. This hasn't been able to be reproduced though.

Originally posted by @O4DEV in #9 (comment)

Here is the adjusted code for windows:

import ctypes
import pathlib

# this is needed to read the WAV file properly
from scipy.io import wavfile

libname = "libwhisper"
fname_model = "models/ggml-tiny.en.bin"
fname_wav = "samples/jfk.wav"

# this needs to match the C struct in whisper.h
class WhisperFullParams(ctypes.Structure):
    _fields_ = [
        ("strategy", ctypes.c_int),
        #
        ("n_max_text_ctx", ctypes.c_int),
        ("n_threads", ctypes.c_int),
        ("offset_ms", ctypes.c_int),
        ("duration_ms", ctypes.c_int),
        #
        ("translate", ctypes.c_bool),
        ("no_context", ctypes.c_bool),
        ("single_segment", ctypes.c_bool),
        ("print_special", ctypes.c_bool),
        ("print_progress", ctypes.c_bool),
        ("print_realtime", ctypes.c_bool),
        ("print_timestamps", ctypes.c_bool),
        #
        ("token_timestamps", ctypes.c_bool),
        ("thold_pt", ctypes.c_float),
        ("thold_ptsum", ctypes.c_float),
        ("max_len", ctypes.c_int),
        ("max_tokens", ctypes.c_int),
        #
        ("speed_up", ctypes.c_bool),
        ("audio_ctx", ctypes.c_int),
        #
        ("prompt_tokens", ctypes.c_void_p),
        ("prompt_n_tokens", ctypes.c_int),
        #
        ("language", ctypes.c_char_p),
        #
        ("suppress_blank", ctypes.c_bool),
        #
        ("temperature_inc", ctypes.c_float),
        ("entropy_thold", ctypes.c_float),
        ("logprob_thold", ctypes.c_float),
        ("no_speech_thold", ctypes.c_float),
        #
        ("greedy", ctypes.c_int * 1),
        ("beam_search", ctypes.c_int * 3),
        #
        ("new_segment_callback", ctypes.c_void_p),
        ("new_segment_callback_user_data", ctypes.c_void_p),
        #
        ("encoder_begin_callback", ctypes.c_void_p),
        ("encoder_begin_callback_user_data", ctypes.c_void_p),
    ]


if __name__ == "__main__":
    # load library and model
    libname = str(pathlib.Path().absolute() / libname)
    whisper = ctypes.WinDLL(libname, winmode=1)

    # tell Python what are the return types of the functions
    whisper.whisper_init_from_file.restype = ctypes.c_void_p
    whisper.whisper_full_default_params.restype = WhisperFullParams
    whisper.whisper_full_get_segment_text.restype = ctypes.c_char_p

    # initialize whisper.cpp context
    ctx = whisper.whisper_init_from_file(fname_model.encode("utf-8"))

    # get default whisper parameters and adjust as needed
    params = whisper.whisper_full_default_params()
    params.print_realtime = True
    params.print_progress = False

    # load WAV file
    samplerate, data = wavfile.read(fname_wav)

    # convert to 32-bit float
    data = data.astype("float32") / 32768.0

    # run the inference
    result = whisper.whisper_full(
        ctypes.c_void_p(ctx),
        params,
        data.ctypes.data_as(ctypes.POINTER(ctypes.c_float)),
        len(data),
    )
    if result != 0:
        print("Error: {}".format(result))
        exit(1)

    # print results from Python
    # print("\nResults from Python:\n")
    n_segments = whisper.whisper_full_n_segments(ctypes.c_void_p(ctx))
    for i in range(n_segments):
        t0 = whisper.whisper_full_get_segment_t0(ctypes.c_void_p(ctx), i)
        t1 = whisper.whisper_full_get_segment_t1(ctypes.c_void_p(ctx), i)
        txt = whisper.whisper_full_get_segment_text(ctypes.c_void_p(ctx), i)

        print(f"{t0/1000.0:.3f} - {t1/1000.0:.3f} : {txt.decode('utf-8')}")

    # free the memory
    whisper.whisper_free(ctypes.c_void_p(ctx))

Metadata

Metadata

Assignees

No one assigned

    Labels

    bindingsBindings for other languages

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions