Release CTranslate2 3.2.0 · OpenNMT/CTranslate2

New features

Add decoding option suppress_sequences to prevent specific sequences of tokens from being generated
Add decoding option end_token to stop the decoding on a different token than the model EOS token
Allow returning multiple random hypotheses from greedy search + random sampling when setting num_hypotheses > 1

Improve support for batch generation with the Whisper model:
- Improve performance of batch generation with a context (we only require the prompts to have the same length, which is easily done by adapting the number of previous text tokens)
- Support batch mode for option return_no_speech_prob
- Support cases where some prompts in the batch have the token <|notimestamps|> but not others
Enable the Conv1D layer in more Python wheels:
- macOS x64 (using oneDNN)
- macOS ARM64 (using a custom implementation)
- Linux AArch64 (using a custom implementation)
Update the OpenNMT-py converter to support the latest checkpoint structure
Generalize the TransformerSpec constructor to accept arbitrary encoder and decoder specifications
Remove the global compilation flag -ffast-math which introduces unwanted side effects and enable it only for the layer norm CPU kernel where it is actually useful
Fix CMake error on Windows when setting -DOPENMP_RUNTIME=COMP