Big drop in accuracy compared to Python version?

If I run the Python and C++ versions of Whisper on the same dataset with the large model on a CPU, the Python version gets a WER of 93%, whereas this C++ version gets a WER of 75%.

Why the big drop in accuracy?

The C++ version is about 80% faster than the Python version, but shouldn't it have the same accuracy if the code is equivalent?