-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include option for other languages #6
Comments
Stefan,
Thanks for your interest in the tagger package. I agree this would be a
nice feature.
This is a bigger lift in that it needs to work the same across coreNLP as
well. Additionally, some of the other functions rely on English and would
need to be upgraded as well. I don't currently have the dev time for this
task. If you or others were willing to address these aspects and do a pull
request this would be much appreciated.
Tyler
…On Fri, Mar 24, 2017 at 10:01 PM, Stefan Müller ***@***.***> wrote:
Thanks for developing this package – it's really helpful. One question:
*openNLP* offers POS taggers for several other languages (
http://opennlp.sourceforge.net/models-1.5/).
They can be loaded easily by installing the respective package, for
instance for the Dutch model:
install.packages("openNLPmodels.nl", repos = "http://datacube.wu.ac.at/", type = "source")
If I want to use these languages for my POS tagging, in *openNLP* I
simply specify the language for the annotators (see e.g.
?Maxent_POS_Tag_Annotator).
Thus, could we implement this option for tidypos as well? I assume we
need to change lines 82 and 83 in the tag_pos.R
<https://github.com/trinker/tagger/blob/3e7831c6107f0c2c43c4803d985ba1ba1e5c79b0/R/tag_pos.R#L82>
file to:
PTA <- openNLP::Maxent_POS_Tag_Annotator(language = "en")WTA <- openNLP::Maxent_Word_Token_Annotator(language = "en")
and add a language option to the tag_pos.R function. We should set
English as the default, but makes it possible to change the language.
I hope that these changes would do the job, but I am not absolutely sure
whether the language option needs to be included in other parts of the
function. If you let me know whether more changes are needed or not (if
yes, which ones?), I am happy to make a pull request.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABrnztk01ujDwSN2ZI20BLLprk13mEUJks5rpHVqgaJpZM4Mo7YN>
.
|
Hi Tyler, Thank you for developing this package and making it so much easier to connect the POS tags to the words. Ok, it makes sense that more functions need to be changed. At the moment I am using spacyr for German and English POS tagging. In the summer or autumn I will need to tag additional languages. I am happy to edit the code and make a PR when I start working on other languages. I keep you posted. |
I have been looking to push this forward...at the moment I'm unable to get the add on language extensions to work from the command line: https://stanfordnlp.github.io/CoreNLP/human-languages.htmlThe documentation for installing the addons isn't clear where they go or are installed. |
Thanks for developing this package – it's really helpful. One question: openNLP offers POS taggers for several other languages (http://opennlp.sourceforge.net/models-1.5/).
They can be loaded easily by installing the respective package, for instance for the Dutch model:
If I want to use these languages for my POS tagging, in openNLP I simply specify the language for the annotators (see e.g.
?Maxent_POS_Tag_Annotator
).Thus, could we implement this option for
tidypos
as well? I assume we need to change lines 82 and 83 in the tag_pos.R file to:and add a
language
option to thetag_pos.R
function. We should set English as the default, but makes it possible to change the language.I hope that these changes would do the job, but I am not absolutely sure whether the
language
option needs to be included in other parts of the function. If you let me know whether more changes are needed or not (if yes, which ones?), I am happy to make a pull request.The text was updated successfully, but these errors were encountered: