Replies: 3 comments
-
Your idea sounds amazing. Yes let's do it! Reading the basics of simhash it sounds like it uses ngrams to detect similarities. If we were to redo search to use trigrams instead of lexemes I wonder if it would be straightforward to search, and if that would be smart enough. Either way, yes let's do it! Sounds like a great idea. |
Beta Was this translation helpful? Give feedback.
-
Prefect! TBH I don't much about Meanwhile if you already have an idea how to do it, let me know. |
Beta Was this translation helpful? Give feedback.
-
FYI I'm going to convert this issue to a discussion. I'm trying out GitHub discussions, and I think an idea/proposal like this is where it belongs. |
Beta Was this translation helpful? Give feedback.
-
Hi 👋
First of all I want to say that this is a brilliant idea! Not only increase the number open source Elixir projects, it also helps people to improve their skills.
What I'm going to suggest here might not be on top of your priority list as the project just started and we need more contributions. However I think it's an interesting problem to solve and in some point will be useful to avoid duplications.
The basic idea is if someone is trying to post a code that is already posted, the system shows the similar posts to the user. It will be the user decisions to post the tip or discard it.
In order to decide if a code is similar to an existing code, we can use simhash.
One of the main challenges that I'm not sure how to address is to make it scalable. As the data grows, going through all the posts and checking the similarities sounds counterintuitive. However we can:
If your open to this feature, I'd be happy get into the details and create a PR!
Beta Was this translation helpful? Give feedback.
All reactions