Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 772 Bytes

README.md

File metadata and controls

14 lines (10 loc) · 772 Bytes

My first try to use DL4J Arbiter and OpenNLP, originally written in 2017.

I wrote the code using the following pattern:

  • Train & evaluate routines that tries multiple parameters
  • Code do do that with sentdetect, tokenizer and POS
  • Each package has a main class: Arbiter.java
  • Specific optimization functions were created, for example:
    • It tries to automatically infer by counting the End of Sentence chars for each language. It is needed to configure the model
    • For POS, it was created as a tool to generate Feature XML automatically, but the search universe gets really big.

To execute you can download the UD 2.0 from here: https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1983

There are newer versions of UD dataset, currently 2.8.