Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include option for other languages #6

Open
stefan-mueller opened this issue Mar 25, 2017 · 3 comments
Open

Include option for other languages #6

stefan-mueller opened this issue Mar 25, 2017 · 3 comments

Comments

@stefan-mueller
Copy link

Thanks for developing this package – it's really helpful. One question: openNLP offers POS taggers for several other languages (http://opennlp.sourceforge.net/models-1.5/).

They can be loaded easily by installing the respective package, for instance for the Dutch model:

install.packages("openNLPmodels.nl", repos = "http://datacube.wu.ac.at/", type = "source")

If I want to use these languages for my POS tagging, in openNLP I simply specify the language for the annotators (see e.g. ?Maxent_POS_Tag_Annotator).

Thus, could we implement this option for tidypos as well? I assume we need to change lines 82 and 83 in the tag_pos.R file to:

PTA <- openNLP::Maxent_POS_Tag_Annotator(language = "en")
WTA <- openNLP::Maxent_Word_Token_Annotator(language = "en")

and add a language option to the tag_pos.R function. We should set English as the default, but makes it possible to change the language.

I hope that these changes would do the job, but I am not absolutely sure whether the language option needs to be included in other parts of the function. If you let me know whether more changes are needed or not (if yes, which ones?), I am happy to make a pull request.

@trinker
Copy link
Owner

trinker commented Mar 26, 2017 via email

@stefan-mueller
Copy link
Author

Hi Tyler,

Thank you for developing this package and making it so much easier to connect the POS tags to the words. Ok, it makes sense that more functions need to be changed. At the moment I am using spacyr for German and English POS tagging. In the summer or autumn I will need to tag additional languages. I am happy to edit the code and make a PR when I start working on other languages.

I keep you posted.
Stefan

@trinker
Copy link
Owner

trinker commented Oct 3, 2017

I have been looking to push this forward...at the moment I'm unable to get the add on language extensions to work from the command line: https://stanfordnlp.github.io/CoreNLP/human-languages.htmlThe documentation for installing the addons isn't clear where they go or are installed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants