You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If an item, for example, "Bourqoqne" appears multiple times in a given document, "Coche-Dury Bourgogne Chardonay 2005, Bourgogne, France", your algorithm will append this same item into the IrIndex.index list and IrIndex.tf list multiple times. This multiple-append implementation distorts the calculation of total number of documents containing the given item in the following code:
If an item, for example, "Bourqoqne" appears multiple times in a given document, "Coche-Dury Bourgogne Chardonay 2005, Bourgogne, France", your algorithm will append this same item into the IrIndex.index list and IrIndex.tf list multiple times. This multiple-append implementation distorts the calculation of total number of documents containing the given item in the following code:
idf = log( float( len(self.documents) ) / float( len(self.tf[term]) ) )
I changed the code from:
for term in terms:
if term not in self.index:
self.index[term] = []
self.tf[term] = []
to:
for term in terms:
if term not in self.index:
self.index[term] = []
self.tf[term] = []
by skipping the subsequent append operations if an item in conjunction with its containing document is already recorded inside an IrIndex object.
The text was updated successfully, but these errors were encountered: