-
Notifications
You must be signed in to change notification settings - Fork 773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a short explanation of the difference between zeroshot and guid… #2238
base: master
Are you sure you want to change the base?
Added a short explanation of the difference between zeroshot and guid… #2238
Conversation
…ed topic modeling to both of the respective documentations so that users immediately know that there are two very similar methods for providing pre-defined topics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! I understand the need to compare these two variants of what is something very similar. Perhaps at some point we would need to compare all variants against each other (maybe a table of sorts) so that users understand the many options out there and when to use it.
That said, left a couple of comments to clear up some things here and there.
!!! Note | ||
Difference between Zero-shot and Guided BERTopic: | ||
Guided BERTopic is similar - yet not equivalent - to [Zeros-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html). | ||
Use Guided BERTopic to boost certain keyword's importance. Use [Zeros-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html) to try to categorize documents into predefined topics ("zero-shot topics") before the clustering the remaining, unclassified documents, using the default unsupervised BERTopic topic exploration algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Guided BERTopic to boost certain keyword's importance. Use [Zeros-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html) to try to categorize documents into predefined topics ("zero-shot topics") before the clustering the remaining, unclassified documents, using the default unsupervised BERTopic topic exploration algorithm. | |
Use Guided BERTopic to boost the importance of certain keywords. Use [Zeros-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html) to try to categorize documents into predefined topics ("zero-shot topics") before clustering the remaining unclassified documents using the main algorithm of BERTopic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm missing in the guided approach is that it is not purely focused on increasing a given keyword importance. Due to it's procedure, it also steers certain documents towards certain clusters. Moreover, guided topic modeling is not primarily meant for increasing a keyword importance. It instead tries to steer documents towards given clusters. The focus here are seed topics, and not seed keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes and for the suggestion regarding the explanation of the guided approach. I'll have a look at this when time allows.
…ween-guided-and-zeroshot
Added a short explanation of the difference between zeroshot and guided topic modeling to both of the respective documentations so that users immediately know that there are two very similar methods for providing pre-defined topics
What does this PR do?
Cross-reference from the zeroshot doc to the guided doc (and vice versa).
Add a short explainer of the differences between zeroshot and guided.
Fixes #2237
Before submitting