Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Skar0 committed Nov 8, 2024
1 parent 4be8f56 commit 56ea9fb
Showing 1 changed file with 53 additions and 25 deletions.
78 changes: 53 additions & 25 deletions docs/getting_started/representation/llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -468,56 +468,84 @@ representation_model = OpenAI(client, model="gpt-3.5-turbo", chat=True, prompt=s
The above is not constrained to just creating a short description or summary of the topic, we can extract labels, keywords, poems, example documents, extensitive descriptions, and more using this method!
If you want to have multiple representations of a single topic, it might be worthwhile to also check out [**multi-aspect**](https://maartengr.github.io/BERTopic/getting_started/multiaspect/multiaspect.html) topic modeling with BERTopic.


## **LangChain**

[Langchain](https://github.com/hwchase17/langchain) is a package that helps users with chaining large language models.
In BERTopic, we can leverage this package in order to more efficiently combine external knowledge. Here, this
external knowledge are the most representative documents in each topic.
[LangChain](https://github.com/hwchase17/langchain) can be used to generate descriptive topic labels in BERTopic. It supports both basic usage with language models and advanced usage with custom chains for full control over the generation process.

To use langchain, you will need to install the langchain package first. Additionally, you will need an underlying LLM to support langchain,
like openai:
To use LangChain, you will need to install it first, along with the specific integration for your chosen language model. For example, to use OpenAI models:

```bash
pip install langchain
pip langchain_openai
pip install langchain-openai
```

Then, you can create your chain as follows:
See the [LangChain integrations page](https://python.langchain.com/docs/integrations/chat/) for the full list of supported chat models and their required packages.

There are two main ways to use LangChain with BERTopic:

### **Basic Usage**

The simplest way is to use a language model with an optional custom prompt:

```python
from bertopic.representation import LangChain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain.chains.combine_documents import create_stuff_documents_chain

chat_model = ChatOpenAI(model=..., api_key=...)
# Create a chat model
chat_model = ChatOpenAI(temperature=0, openai_api_key="...")

prompt = ChatPromptTemplate.from_template("What are these documents about? {documents}. Please give a single label.")
# Create your representation model with the pre-defined prompt
representation_model = LangChain(llm=chat_model)

chain = RunnablePassthrough.assign(representation=create_stuff_documents_chain(chat_model, prompt, document_variable_name="documents"))
# Use the representation model in BERTopic
topic_model = BERTopic(representation_model=representation_model)
```

Finally, you can pass the chain to BERTopic as follows:
You can also customize the prompt:

```python
from bertopic.representation import LangChain
prompt = "Here is a list of documents: [DOCUMENTS]. These documents are described by these keywords: [KEYWORDS]. Please give a short label."
representation_model = LangChain(llm=chat_model, prompt=prompt)
```

# Create your representation model
representation_model = LangChain(chain)
### **Advanced Usage**

# Use the representation model in BERTopic on top of the default pipeline
For more control, you can create a custom LangChain chain to generate the representations. The representation of a topic can in that case be a single label or a list of labels, and must be directly returned by the chain.

Here's an example using multiple labels output:

```python
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.output_parsers import CommaSeparatedListOutputParser

# Multiple Labels Output
list_prompt = ChatPromptTemplate.from_template(
"Here is a list of documents: {DOCUMENTS}. These documents are described by these keywords: {KEYWORDS}. Output a comma-separated list of labels that represents these documents."
)
list_chain = create_stuff_documents_chain(
llm=chat_model,
prompt=list_prompt,
document_variable_name="DOCUMENTS",
output_parser=CommaSeparatedListOutputParser()
)

# Use in BERTopic
representation_model = LangChain(chain=list_chain)
topic_model = BERTopic(representation_model=representation_model)
```

You can also customize the prompt, and include the optional `keywords` placeholder to add the keywords to the prompt.
!!! note
When creating custom chains, the prompt uses LangChain's syntax with curly braces: `{DOCUMENTS}` and `{KEYWORDS}` instead of BERTopic's `[DOCUMENTS]` and `[KEYWORDS]`.

### **Chain Configuration**

You can configure the chain with different parameters or add callbacks. For example, to handle rate limits when using external APIs, you can control the number of concurrent requests:

```python
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are provided with a list of documents and are asked to provide a single label for the topic."),
("human", "Here is the list of documents: {documents} and related keywords: {keywords}"),
]
representation_model = LangChain(
chain=chain,
chain_config={"max_concurrency": 5}
)
```

Expand Down

0 comments on commit 56ea9fb

Please sign in to comment.