-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for converting a part of speech into another #5
Comments
Hello! The conversion between POS categories is a feature I have seen requested elsewhere, but it seems like a tricky one to tackle. I don't have the kind of time needed to have a look at it, sadly. As per your sidenote: I consider this project to be a combination of a morphological analyzer + morphological generator. The former concerns itself with splitting a word into a root and "parts of words" (e.g. "huggable" -> "hug" + "-able"), while the latter does the exact opposite. (e.g. "hug" + "-able" -> "huggable"). See Chapter 2: Preliminaries from my thesis for more information. (Note: In my thesis I refer to "Inflexion", but this was later renamed to "Inflex", i.e. this work) Inflex, in theory, performs morphological analysis, extracts the root, replaces the additional "parts of words" with something else, and then applies the morphological generation step to reconstruct a word. A lemmatizer is essentially the first step, where we don't care about saving the additional "parts of words". So, I would say that Inflex is kind of like a lemmatizer followed by a morphological generator, but the implementation kind of merges the two steps usually. |
Thx for the answer, I should look at your thesis when I get the time! I wonder if there are other tools in the wild for POS conversions, it's hard to query despite being a foundational task. I can't believe those problems have not been attempted to be solved in the 90s, thoses codes are probably long forgotten by all.. |
I don't know of any tool that does this, sadly. I'm well aware of WordNet and NLTK, but that corpus does not easily allow these POS changes. I agree that it's a shame, but it's a difficult problem. |
I read partially your thesis, It is greatly written and the amount of detail is excellent! Quite impressive that you cited a paper from 1943!
|
I'm afraid that no improvements on those sections have been made. |
@tomaarsen Friendly ping!
For example:
Verb to noun:
die -> death
Adjective to adverb:
Beautiful -> beautifully
Make it happen, please!!
My project would benefit a lot from those abilities :)
Your library has the potential to become a cornerstone of NLU and NLG projects, pushing back the frontier of what can be achieved.
Side note: Did it ever occur to you that your project is the inverse function of a lemnatizer? Kinda obvious in retrospect but maybe there are (or not) insights to derive from this and from the methods of state of the art lemnatizer s.
In other words, i expect a SOTA lemnatizer to have some "understanding" of inflexions in order to reverse them accurately (contrary to a word stemmer), and this might slightly overlap with relevant algorithms for your Inflector.
The text was updated successfully, but these errors were encountered: