Skip to content

The release and demo package of CRFSharp v1.2.0.0

Latest
Compare
Choose a tag to compare
@zhongkaifu zhongkaifu released this 15 Mar 18:55
· 44 commits to master since this release

CRFSharp is Conditional Random Fields implemented by .NET(C#), a machine learning algorithm for learning from labeled sequences of examples. It is widely used in Natural Language Process (NLP) tasks, for example: word breaker, postaging, named entity recognized and so on. This is the binary files and demo package of CRFSharp v1.0.0.0.

The binary files are under "bin" folder. If you want to get the latest version or source code, please access project website at https://github.com/zhongkaifu/CRFSharp

The following are introduction about demo packages

Demo 1. Named entity recognizer in English
This demo is to label named entity type from given text. So far, it supports location, person and organization name. To build model file, run batch file "build_english_ner_demo.bat", then CRFSharp encoder will be called and start to train model. After training is finished, run batch file "test_english_ner_demo.bat" to test the model.

Training corpus: .\data\demo_english\corpus
Testing corpus: .\data\demo_english\test
Template file: .\data\demo_english\template.NE

The trained model file will be saved into ".\data\demo_english\model" folder, and the test result will be generated at ".\data\demo_english\test" folder. The format of training corpus can be processed by CRFSharp encoder directly.

Demo 2. Orginzation inner-structure parser in Chinese

This demo is label inner-structure of given organization name in Chinese. It supports location name, core name, modifier name and suffix name. To build model file, run batch file "build_chinese_org_parser_model.bat". After CRFSharp encoder is finished successfully, run batch file "test_chinese_org_parser_model.bat" to test the model.

Training corpus: .\data\demo_org_chinese\corpus
Testing corpus: .\data\demo_org_chinese\test
Template file: .\data\demo_org_chinese\template.1

The trained model file will be saved into ".\data\demo_org_chinese\model" folder, and the test result will be generated at ".\data\demo_org_chinese\test" folder. The format of training corpus is raw format which is human reading friendly. Before CRFSharp encoder actually process it, corpus2tag.exe will be called to convert its format by using ".\data\demo_org_chinese\tags.exe" file which is to mapping named entity types.