Open
Description
I guess I found a bug in the way the scoring is done.
For example a article page from cnn:
DEBUG:root:Candidate p#cnnContentContainer.cnn_storyarea with score 163.5
DEBUG:root:Candidate p#.cnn_contentarea with score 138.0
DEBUG:root:Candidate p#cnnContainer. with score 118.5
DEBUG:root:Candidate body#. with score 113.5
DEBUG:root:Candidate p#.cnn_strycntntlft with score 111.0
all of those 5 candidates are somehow childs of eachother (body#->p.*). So it happens, that the result is showing to much text which is not needed.
An idea would be to remove child nodes from the parent before calculating the score.
Metadata
Metadata
Assignees
Labels
No labels