Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a short explanation of the difference between zeroshot and guid… #2238

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/getting_started/guided/guided.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
!!! Note
Difference between Zero-shot and Guided BERTopic:
Guided BERTopic is similar - yet not equivalent - to [Zeros-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html).
Use Guided BERTopic to boost certain keyword's importance. Use [Zeros-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html) to try to categorize documents into predefined topics ("zero-shot topics") before the clustering the remaining, unclassified documents, using the default unsupervised BERTopic topic exploration algorithm.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Use Guided BERTopic to boost certain keyword's importance. Use [Zeros-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html) to try to categorize documents into predefined topics ("zero-shot topics") before the clustering the remaining, unclassified documents, using the default unsupervised BERTopic topic exploration algorithm.
Use Guided BERTopic to boost the importance of certain keywords. Use [Zeros-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html) to try to categorize documents into predefined topics ("zero-shot topics") before clustering the remaining unclassified documents using the main algorithm of BERTopic.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm missing in the guided approach is that it is not purely focused on increasing a given keyword importance. Due to it's procedure, it also steers certain documents towards certain clusters. Moreover, guided topic modeling is not primarily meant for increasing a keyword importance. It instead tries to steer documents towards given clusters. The focus here are seed topics, and not seed keywords.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes and for the suggestion regarding the explanation of the guided approach. I'll have a look at this when time allows.



Guided Topic Modeling or Seeded Topic Modeling is a collection of techniques that guides the topic modeling approach by setting several seed topics to which the model will converge to. These techniques allow the user to set a predefined number of topic representations that are sure to be in documents. For example, take an IT business that has a ticket system for the software their clients use. Those tickets may typically contain information about a specific bug regarding login issues that the IT business is aware of.

To model that bug, we can create a seed topic representation containing the words `bug`, `login`, `password`,
Expand Down
4 changes: 4 additions & 0 deletions docs/getting_started/zeroshot/zeroshot.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
!!! Note
Difference between Zero-shot and Guided BERTopic:
Zeros-shot Topic Modeling is similar - yet not equivalent - to [Guided BERTopic](https://maartengr.github.io/BERTopic/getting_started/guided/guided.html). Use [Guided BERTopic](https://maartengr.github.io/BERTopic/getting_started/guided/guided.html) to boost certain keyword's importance. Use [Zeros-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html) to try to categorize documents into predefined topics ("zero-shot topics") before the clustering the remaining, unclassified documents, using the default unsupervised BERTopic topic exploration algorithm.

Zero-shot Topic Modeling is a technique that allows you to find topics in large amounts of documents that were predefined. When faced with many documents, you often have an idea of which topics will definitely be in there. Whether that is a result of simply knowing your data or if a domain expert is involved in defining those topics.

This method allows you to not only find those specific topics but also create new topics for documents that would not fit with your predefined topics.
Expand Down