You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One limit to phrasemachine is that it returns overlapping spans, which may be unsuitable for some use cases. For instance, phrasemachine will return "Kim Kardashian", "Kim Kardashian West" and "Kardashian West" for the sentence, "Kim Kardashian West will attend". For at least some phrasemachine users, this will be undesirable: they will just want a single, canonical span (e.g. "Kim Kardashian")
There are definitely cases where even determining what the canonical name even should be is tricky, e.g. "Sichuan hot pot dishes are delicious" => 'sichuan hot pot', 'sichuan hot pot dishes', 'hot pot', 'hot pot dishes', 'pot dishes'. [shrugs] And in other cases I could imagine that there will be all sorts of complex semantic issues at play.
But our current solution is basically to just do nothing. I wonder if some users would prefer some decision just being made for them. Maybe we should offer some small, simple model trained on canonical Wikipedia titles for overlapping spans. In the first case case, "Kim Kardashian" would be the ''correct'' answer b/c that is the wikipedia page. I would imagine this would at least identify obviously terrible canonical names, e.g. (''Rev. Jean-Bertrand'' for ''Rev. Jean-Bertrand Aristide'').
Should this be attempted? I guess one con is that it starts making phrasemachine more complex. And another con is that it might not work, or might have complications that we can't foresee. But it seems like a good, standalone project for someone looking to help out with phrasemachine or tackle a contained, NLP problem.
The text was updated successfully, but these errors were encountered:
One limit to phrasemachine is that it returns overlapping spans, which may be unsuitable for some use cases. For instance, phrasemachine will return "Kim Kardashian", "Kim Kardashian West" and "Kardashian West" for the sentence, "Kim Kardashian West will attend". For at least some phrasemachine users, this will be undesirable: they will just want a single, canonical span (e.g. "Kim Kardashian")
There are definitely cases where even determining what the canonical name even should be is tricky, e.g. "Sichuan hot pot dishes are delicious" => 'sichuan hot pot', 'sichuan hot pot dishes', 'hot pot', 'hot pot dishes', 'pot dishes'. [shrugs] And in other cases I could imagine that there will be all sorts of complex semantic issues at play.
But our current solution is basically to just do nothing. I wonder if some users would prefer some decision just being made for them. Maybe we should offer some small, simple model trained on canonical Wikipedia titles for overlapping spans. In the first case case, "Kim Kardashian" would be the ''correct'' answer b/c that is the wikipedia page. I would imagine this would at least identify obviously terrible canonical names, e.g. (''Rev. Jean-Bertrand'' for ''Rev. Jean-Bertrand Aristide'').
Should this be attempted? I guess one con is that it starts making phrasemachine more complex. And another con is that it might not work, or might have complications that we can't foresee. But it seems like a good, standalone project for someone looking to help out with phrasemachine or tackle a contained, NLP problem.
The text was updated successfully, but these errors were encountered: