This project generates a NER dataset using Wikipedia and Wikidata. It was developed especially for morphologically challenging and low resource language.
- python 3.6
- pipenv
- First clone the repository to your computer.
git clone [email protected]:derlem/susamuru.git
- Go to directory /susamuru/susamuru
cd susamuru/susamuru
- Install dependencies
pipenv install
- Change to pipenv shell
pipenv shell
- Download the wikipedia tr pages dump(which is the latest dump available). Here is the link
- Extract the dump to
/susamuru/susamuru/dumps
folder. - Now you are good to go. Start the execution with:
pipenv run python susamuru.py
- After the execution, you should be able to see the output file in
susamuru/susamuru/output
folder.
- For the countries where access to wikipedia web page is restricted, consider using VPN.