Skip to content

UDPipe 1.1.0

Compare
Choose a tag to compare
@foxik foxik released this 29 Mar 10:24
· 231 commits to master since this release

Changes since UDPipe 1.0.0:

  • Morphodita_parsito models (now version 3) require at least UDPipe version 1.1.0.
  • CoNLL-U v2 format is supported. Notably spaces in forms and lemmas are now allowed, as are empty nodes.
  • Support options for input_format and output_format instances.
  • Preserve all spacing when tokenizing.
  • Optionally generate document-level token ranges in the original text.
  • Optionally respect given segmentation during tokenization.
  • Tokenizer can be trained to allow spaces in tokens (default if there are forms with spaces in the training data).
  • Parser can be trained to return always one root per sentence (default).
  • Improve input_format API to allow inter-block state (for correct tracking of inter-sentence spaces and document-level offsets).
  • Improve output_format API to support begin/end document marks and to allow state in the output_format instance (to allow numbering output sentences, for example).