-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check out "Winnowing algorithm" by Dr S Baltes #186
Comments
And here #28 (comment)
|
Feel free to modify the similarity threshold according to your needs. In our evaluation, we used a combination of metrics and heuristics to match code blocks, so the results may not be completely transferable to your use case. Besides the Winnowing algorithm (which I by the way just implemented, not invented :-) ), you may check out simpler metrics such as tokenDiceNormalized which scored best in a later evaluation. |
Thanks, @sbaltes, we'll check those out as well. I can somewhat see that the algorithm is trying its best to match within the code blocks. Perhaps we can use it for the code only checks. It certainly is looking very promising, at this stage. (We're testing it out in the Workshop. Let us know if you want to check it out, we will provide you with write access to the room) |
Sure, if there is anything I can help you with, just let me know. |
BTW: There is a related discussion on Stack Overflow Meta: https://meta.stackoverflow.com/questions/375761/how-to-handle-code-clones-on-stack-overflow |
In issue #28, I also pointed to our implementation of the Winnowing algorithm, which serves a similar purpose. We already evaluated how suitable it is for comparing Stack Overflow posts.
Originally posted by @sbaltes in #160 (comment)
The text was updated successfully, but these errors were encountered: