Skip to content

derlem/susamuru

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Generation of Dataset for Named Entity Recognition Task

This project generates a NER dataset using Wikipedia and Wikidata. It was developed especially for morphologically challenging and low resource language.

tools (all of these are subject to change)

  • python 3.6
  • pipenv

How to run

  • First clone the repository to your computer. git clone [email protected]:derlem/susamuru.git
  • Go to directory /susamuru/susamuru cd susamuru/susamuru
  • Install dependencies pipenv install
  • Change to pipenv shell pipenv shell
  • Download the wikipedia tr pages dump(which is the latest dump available). Here is the link
  • Extract the dump to /susamuru/susamuru/dumps folder.
  • Now you are good to go. Start the execution with: pipenv run python susamuru.py
  • After the execution, you should be able to see the output file in susamuru/susamuru/output folder.

Note:

  • For the countries where access to wikipedia web page is restricted, consider using VPN.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages