Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARK-style twitter tags and direct universal tags #5

Open
brendano opened this issue Oct 16, 2016 · 2 comments
Open

ARK-style twitter tags and direct universal tags #5

brendano opened this issue Oct 16, 2016 · 2 comments

Comments

@brendano
Copy link
Contributor

brendano commented Oct 16, 2016

make it work for twitter. dont bother with wrapper the ark tagger, but work with calling as get_phrases(pos=..., tokens=...)

just take the bare one-character tags (Gimpel et al 2011) so no needs for the Coarse* conversion layer the old openfst/foma/pyfst version had. and while we're at it why not use the all-caps Petrov tags directly too. hopefully there are no tag system naming conflicts with all this?

backburner: see what the nltk tagset conversion systems are now (@nschneid submitted something a while back)

@brendano
Copy link
Contributor Author

more TODO: coarsen_POS_tags.R needs to be updated also. current Coarse* inputs wont do anything since these codepaths never normalize to Coarse*. only the old pre-openfst prepreprocessor did that which we've ditched.

more TODO: write bilingual tests for POS coarsening

@brendano
Copy link
Contributor Author

brendano commented Feb 4, 2018

need to look into: NLTK has some tagset conversion methods now http://www.nltk.org/_modules/nltk/tag/mapping.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant