Find canonical names from Wikipedia titles? #20

AbeHandler · 2019-07-18T14:47:04Z

One limit to phrasemachine is that it returns overlapping spans, which may be unsuitable for some use cases. For instance, phrasemachine will return "Kim Kardashian", "Kim Kardashian West" and "Kardashian West" for the sentence, "Kim Kardashian West will attend". For at least some phrasemachine users, this will be undesirable: they will just want a single, canonical span (e.g. "Kim Kardashian")

There are definitely cases where even determining what the canonical name even should be is tricky, e.g. "Sichuan hot pot dishes are delicious" => 'sichuan hot pot', 'sichuan hot pot dishes', 'hot pot', 'hot pot dishes', 'pot dishes'. [shrugs] And in other cases I could imagine that there will be all sorts of complex semantic issues at play.

But our current solution is basically to just do nothing. I wonder if some users would prefer some decision just being made for them. Maybe we should offer some small, simple model trained on canonical Wikipedia titles for overlapping spans. In the first case case, "Kim Kardashian" would be the ''correct'' answer b/c that is the wikipedia page. I would imagine this would at least identify obviously terrible canonical names, e.g. (''Rev. Jean-Bertrand'' for ''Rev. Jean-Bertrand Aristide'').

Should this be attempted? I guess one con is that it starts making phrasemachine more complex. And another con is that it might not work, or might have complications that we can't foresee. But it seems like a good, standalone project for someone looking to help out with phrasemachine or tackle a contained, NLP problem.

AbeHandler · 2019-11-23T14:45:55Z

Some related work for record-keeping https://arxiv.org/pdf/1906.06703.pdf

I seem to recall there was another paper from Stanovsky or Dagan or both on this but I can't seem to find it.

AbeHandler added the question label Jul 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find canonical names from Wikipedia titles? #20

Find canonical names from Wikipedia titles? #20

AbeHandler commented Jul 18, 2019 •

edited

Loading

AbeHandler commented Nov 23, 2019 •

edited

Loading

Find canonical names from Wikipedia titles? #20

Find canonical names from Wikipedia titles? #20

Comments

AbeHandler commented Jul 18, 2019 • edited Loading

AbeHandler commented Nov 23, 2019 • edited Loading

AbeHandler commented Jul 18, 2019 •

edited

Loading

AbeHandler commented Nov 23, 2019 •

edited

Loading