Check out "Winnowing algorithm" by Dr S Baltes #186

Bhargav-Rao · 2019-03-02T19:02:59Z

In issue #28, I also pointed to our implementation of the Winnowing algorithm, which serves a similar purpose. We already evaluated how suitable it is for comparing Stack Overflow posts.

Originally posted by @sbaltes in #160 (comment)

Bhargav-Rao · 2019-03-02T19:03:43Z

And here #28 (comment)

In a recent research paper, we evaluated different string similarity metrics to match Stack Overflow code and text blocks to their predecessors. Those results could also be interesting for your projects, because our implementations of the metrics are available on GitHub. You could, for example, use the Winnowing algorithm to compare code blocks.

Bhargav-Rao · 2019-03-02T21:39:03Z

Actual score of 0.32, looks quite good for a start.

sbaltes · 2019-03-03T12:20:42Z

Feel free to modify the similarity threshold according to your needs. In our evaluation, we used a combination of metrics and heuristics to match code blocks, so the results may not be completely transferable to your use case. Besides the Winnowing algorithm (which I by the way just implemented, not invented :-) ), you may check out simpler metrics such as tokenDiceNormalized which scored best in a later evaluation.

Bhargav-Rao · 2019-03-03T20:16:03Z

Thanks, @sbaltes, we'll check those out as well. I can somewhat see that the algorithm is trying its best to match within the code blocks. Perhaps we can use it for the code only checks. It certainly is looking very promising, at this stage.

(We're testing it out in the Workshop. Let us know if you want to check it out, we will provide you with write access to the room)

sbaltes · 2019-03-04T09:26:28Z

Sure, if there is anything I can help you with, just let me know.

sbaltes · 2019-03-04T09:27:52Z

BTW: There is a related discussion on Stack Overflow Meta: https://meta.stackoverflow.com/questions/375761/how-to-handle-code-clones-on-stack-overflow

Bhargav-Rao added a commit that referenced this issue Mar 2, 2019

Starts work on #186 - Added the Winnowing algo to the reasons list.

430a447

FelixSFD added this to the 1.3 milestone Mar 3, 2019

FelixSFD added the enhancement label Mar 3, 2019

FelixSFD assigned Bhargav-Rao Mar 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check out "Winnowing algorithm" by Dr S Baltes #186

Check out "Winnowing algorithm" by Dr S Baltes #186

Bhargav-Rao commented Mar 2, 2019

Bhargav-Rao commented Mar 2, 2019

Bhargav-Rao commented Mar 2, 2019

sbaltes commented Mar 3, 2019

Bhargav-Rao commented Mar 3, 2019

sbaltes commented Mar 4, 2019

sbaltes commented Mar 4, 2019

Check out "Winnowing algorithm" by Dr S Baltes #186

Check out "Winnowing algorithm" by Dr S Baltes #186

Comments

Bhargav-Rao commented Mar 2, 2019

Bhargav-Rao commented Mar 2, 2019

Bhargav-Rao commented Mar 2, 2019

sbaltes commented Mar 3, 2019

Bhargav-Rao commented Mar 3, 2019

sbaltes commented Mar 4, 2019

sbaltes commented Mar 4, 2019