Replies: 4 comments 1 reply
-
RAPTOR might also be a good choice. The idea is to create a tree-like structure from chunks by using dimensionality reduction & e.g. k-means for clustering. Afterwards these clusters are clustered again and so on. Copying from LinkedIn:
It might be a good choice too to combine both ideas, e.g. use semantic chunking (which seems better than relying on error-prone dim reduction) and create a tree-like structure from these chunks instead. |
Beta Was this translation helpful? Give feedback.
-
@VarunNSrivastava do you maybe have other ideas too or did you see any promising demos or similar? |
Beta Was this translation helpful? Give feedback.
-
Interesting! I'll dig into these demos. I spent a lot of time with a very similar approach, and found it a bit uneffective and computationally expensive. I found that breaking at punctuation was a bit more effective... |
Beta Was this translation helpful? Give feedback.
-
Linking current plans: |
Beta Was this translation helpful? Give feedback.
-
I saw this post about semantic chunking using sliding chunk windows and think this is a cool approach.
In a nutshell you use sliding chunk windows and calculate their embeddings and their distance to each other. Using this score you are trying to identify semantic "break points" i.e. where you have the largest delta.
Together with minimum and maximum chunk size that would make for a great alternative to the current chunking functions!
Langchain implemented this already and gives some good explanation in their docs.
Beta Was this translation helpful? Give feedback.
All reactions