Releases: microsoft/BlingFire
Releases · microsoft/BlingFire
Bling Fire v0.1.8
- added IdsToText API for all models which return Ids, for example see https://github.com/microsoft/BlingFire/blob/master/scripts/blingfire_example.py
Bling Fire v0.1.7
- added no_dummy_prefix configuration and API to change the existing model configuration
- fixed the offset of the dummy prefix is now always -1, the first token may have start/end offset -1 it means dummy prefix is included
- change compilation options for Windows code
Bling Fire v0.1.5
- Added byte BPE algorithm support
- Added GPT2, Roberta tokenization models
- Added hyphenation / syllabification APIs and a sample model: syllab
- Added URL tokenization models: uri100k, uri250k, uri500k
- Some small changes in the C# interface (it should be backwards compatible), uses Span instead of byte[] to allow on stack allocations of input and output buffers
blingfire pypi package v0.1.3
Four tokenization algorithms supported: patterns, word-piece, unigram lm, bpe. Added space normalization api, Added a few more popular models, added unigram lm tokenization models trained on uniformly represented ~84 languages from wikimatrix set. Bug fixes, parity fixes.