-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError when using disambiguate() with maxsim algorithm #59
Comments
Getting the same error with a similar kind of usage in python 3.8, pywsd 1.2.4
Gives an index out of bounds error in pywsd.similarity.max_similarity() Scotch-tape patch with:
in pywsd.similarity
is the "fix" Doesn't really resolve the underlying issue though |
I was also getting this error. I found that it was because the incorrect Why the wrong However, we can still catch a bad from pywsd.tokenize import word_tokenize
from pywsd.utils import lemmatize
from pywsd import sim
def max_similarity_fix(context_sentence: str, ambiguous_word: str, option="path",
lemma=True, context_is_lemmatized=False, pos=None, best=True, from_cache=False) -> "wn.Synset":
"""
Perform WSD by maximizing the sum of maximum similarity between possible
synsets of all words in the context sentence and the possible synsets of the
ambiguous words (see https://ibin.co/4gG9zUlejUUA.png):
{argmax}_{synset(a)}(\sum_{i}^{n}{{max}_{synset(i)}(sim(i,a))}
:param context_sentence: String, a sentence.
:param ambiguous_word: String, a single word.
:return: If best, returns only the best Synset, else returns a dict.
"""
ambiguous_word = lemmatize(ambiguous_word)
syn = wn.synsets(ambiguous_word, pos=pos) or wn.synsets(ambiguous_word)
# If ambiguous word not in WordNet return None
if not syn:
return None
if context_is_lemmatized:
context_sentence = word_tokenize(context_sentence)
else:
context_sentence = [lemmatize(w) for w in word_tokenize(context_sentence)]
result = {}
for i in syn:
result[i] = 0
for j in context_sentence:
_result = [0]
for k in wn.synsets(j):
_result.append(sim(i,k,option))
result[i] += max(_result)
if option in ["res","resnik"]: # lower score = more similar
result = sorted([(v,k) for k,v in result.items()])
else: # higher score = more similar
result = sorted([(v,k) for k,v in result.items()],reverse=True)
return result[0][1] if best else result You can see this works for sentence = "should sentiment. deep-water. co-beneficiary."
print( disambiguate(sentence, algorithm=max_similarity_fix )) I am uncertain as to whether using an unspecified |
I'm using Google Colab
s = "would sentiment"
disambiguate(s, algorithm=maxsim, similarity_option='path', keepLemmas=True)
the same with "may sentiment", "might sentiment", "must sentiment", ...
IndexError Traceback (most recent call last)
in ()
1 s = "would sentiment"
----> 2 disambiguate(s, algorithm=maxsim, similarity_option='path', keepLemmas=True)
1 frames
/usr/local/lib/python3.6/dist-packages/pywsd/allwords_wsd.py in disambiguate(sentence, algorithm, context_is_lemmatized, similarity_option, keepLemmas, prefersNone, from_cache, tokenizer)
43 synset = algorithm(lemma_sentence, lemma, from_cache=from_cache)
44 elif algorithm == max_similarity:
---> 45 synset = algorithm(lemma_sentence, lemma, pos=pos, option=similarity_option)
46 else:
47 synset = algorithm(lemma_sentence, lemma, pos=pos, context_is_lemmatized=True,
/usr/local/lib/python3.6/dist-packages/pywsd/similarity.py in max_similarity(context_sentence, ambiguous_word, option, lemma, context_is_lemmatized, pos, best)
125 result = sorted([(v,k) for k,v in result.items()],reverse=True)
126
--> 127 return result[0][1] if best else result
IndexError: list index out of range
The text was updated successfully, but these errors were encountered: