Description
The c-style api/python bindings are not working on windows. On Linux/MacOS/WSL the code snippets from issue #9 are working without any trouble. But on windows the adjusted code is just shut down as you can see in the comment below.
For some reason none of these solutions seem to work now on Windows (both 10 and 11). Every time on multiple machines the Python interpreter exits and shuts down without any error during the invocation of
whisper_full
. Does anyone have any ideas?So far I've used both my
cython
extension and the variousctypes
(using updated versions of the examples provided here). I've even made sure the compiler is MSVC, and it matches the version used by Python - I am using Python 3.10. With thecython
extension I've tried linking against the dynamic library, static library and also tried just object files. Compiled both using the CMake scripts and an updated Makefile.This only happens on Windows and the exact same code works fine on MacOS and Linux. Without windows compatibility this extension becomes useless for its original purpose. If anyone could help I would greatly appreciate it.
The only clue I've got is that; one time when using a (Debug) DLL created by VS 2020 (with a more recent compiler than python3.10!) with a solution from
CMake
, a complaint was given about arv==0
assert presumably this is a threading issue possibly not using pthreads however all the defaultCMake
options were used to create the solution file. Nothing was edited. Just the latest version was built straight. pthreads should have been used. This was using the latestctypes
solution mentioned. This hasn't been able to be reproduced though.
Originally posted by @O4DEV in #9 (comment)
Here is the adjusted code for windows:
import ctypes
import pathlib
# this is needed to read the WAV file properly
from scipy.io import wavfile
libname = "libwhisper"
fname_model = "models/ggml-tiny.en.bin"
fname_wav = "samples/jfk.wav"
# this needs to match the C struct in whisper.h
class WhisperFullParams(ctypes.Structure):
_fields_ = [
("strategy", ctypes.c_int),
#
("n_max_text_ctx", ctypes.c_int),
("n_threads", ctypes.c_int),
("offset_ms", ctypes.c_int),
("duration_ms", ctypes.c_int),
#
("translate", ctypes.c_bool),
("no_context", ctypes.c_bool),
("single_segment", ctypes.c_bool),
("print_special", ctypes.c_bool),
("print_progress", ctypes.c_bool),
("print_realtime", ctypes.c_bool),
("print_timestamps", ctypes.c_bool),
#
("token_timestamps", ctypes.c_bool),
("thold_pt", ctypes.c_float),
("thold_ptsum", ctypes.c_float),
("max_len", ctypes.c_int),
("max_tokens", ctypes.c_int),
#
("speed_up", ctypes.c_bool),
("audio_ctx", ctypes.c_int),
#
("prompt_tokens", ctypes.c_void_p),
("prompt_n_tokens", ctypes.c_int),
#
("language", ctypes.c_char_p),
#
("suppress_blank", ctypes.c_bool),
#
("temperature_inc", ctypes.c_float),
("entropy_thold", ctypes.c_float),
("logprob_thold", ctypes.c_float),
("no_speech_thold", ctypes.c_float),
#
("greedy", ctypes.c_int * 1),
("beam_search", ctypes.c_int * 3),
#
("new_segment_callback", ctypes.c_void_p),
("new_segment_callback_user_data", ctypes.c_void_p),
#
("encoder_begin_callback", ctypes.c_void_p),
("encoder_begin_callback_user_data", ctypes.c_void_p),
]
if __name__ == "__main__":
# load library and model
libname = str(pathlib.Path().absolute() / libname)
whisper = ctypes.WinDLL(libname, winmode=1)
# tell Python what are the return types of the functions
whisper.whisper_init_from_file.restype = ctypes.c_void_p
whisper.whisper_full_default_params.restype = WhisperFullParams
whisper.whisper_full_get_segment_text.restype = ctypes.c_char_p
# initialize whisper.cpp context
ctx = whisper.whisper_init_from_file(fname_model.encode("utf-8"))
# get default whisper parameters and adjust as needed
params = whisper.whisper_full_default_params()
params.print_realtime = True
params.print_progress = False
# load WAV file
samplerate, data = wavfile.read(fname_wav)
# convert to 32-bit float
data = data.astype("float32") / 32768.0
# run the inference
result = whisper.whisper_full(
ctypes.c_void_p(ctx),
params,
data.ctypes.data_as(ctypes.POINTER(ctypes.c_float)),
len(data),
)
if result != 0:
print("Error: {}".format(result))
exit(1)
# print results from Python
# print("\nResults from Python:\n")
n_segments = whisper.whisper_full_n_segments(ctypes.c_void_p(ctx))
for i in range(n_segments):
t0 = whisper.whisper_full_get_segment_t0(ctypes.c_void_p(ctx), i)
t1 = whisper.whisper_full_get_segment_t1(ctypes.c_void_p(ctx), i)
txt = whisper.whisper_full_get_segment_text(ctypes.c_void_p(ctx), i)
print(f"{t0/1000.0:.3f} - {t1/1000.0:.3f} : {txt.decode('utf-8')}")
# free the memory
whisper.whisper_free(ctypes.c_void_p(ctx))