Skip to content

Commit

Permalink
feat: rename "default configuration" to "preferred configuration" (#361)
Browse files Browse the repository at this point in the history
  • Loading branch information
ludwiktrammer authored Feb 20, 2025
1 parent c335ff1 commit 96e011f
Show file tree
Hide file tree
Showing 29 changed files with 310 additions and 107 deletions.
5 changes: 5 additions & 0 deletions docs/api_reference/core/metadata-stores.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Metadata Stores

::: ragbits.core.metadata_stores.base.MetadataStore

::: ragbits.core.metadata_stores.in_memory.InMemoryMetadataStore
23 changes: 22 additions & 1 deletion docs/api_reference/document_search/processing.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,24 @@
# Document Processing

::: ragbits.document_search.ingestion.document_processor.DocumentProcessorRouter
::: ragbits.document_search.ingestion.document_processor.DocumentProcessorRouter

## Providers
::: ragbits.document_search.ingestion.providers.base.BaseProvider
options:
heading_level: 3

::: ragbits.document_search.ingestion.providers.dummy.DummyProvider
options:
heading_level: 3

::: ragbits.document_search.ingestion.providers.unstructured.UnstructuredDefaultProvider
options:
heading_level: 3

::: ragbits.document_search.ingestion.providers.unstructured.UnstructuredImageProvider
options:
heading_level: 3

::: ragbits.document_search.ingestion.providers.unstructured.UnstructuredPdfProvider
options:
heading_level: 3
6 changes: 6 additions & 0 deletions docs/api_reference/document_search/retrieval/rephrasers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Query Rephrasers

::: ragbits.document_search.retrieval.rephrasers.QueryRephraser
::: ragbits.document_search.retrieval.rephrasers.LLMQueryRephraser
::: ragbits.document_search.retrieval.rephrasers.MultiQueryRephraser
::: ragbits.document_search.retrieval.rephrasers.NoopQueryRephraser
5 changes: 5 additions & 0 deletions docs/api_reference/document_search/retrieval/rerankers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Rerankers

::: ragbits.document_search.retrieval.rerankers.base.Reranker
::: ragbits.document_search.retrieval.rerankers.litellm.LiteLLMReranker
::: ragbits.document_search.retrieval.rerankers.noop.NoopReranker
4 changes: 3 additions & 1 deletion docs/cli/main.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Ragbits CLI

Ragbits comes with a command line interface (CLI) that provides a number of commands for working with the Ragbits platform. It can be accessed by running the `ragbits` command in your terminal.
Ragbits comes with a command-line interface (CLI) that provides several commands for working with the Ragbits platform. It can be accessed by running the `ragbits` command in your terminal.

Commands that operate on Ragbits components, such as [`ragbits vector-store`](#ragbits-vector-store), use the project's preferred component implementations if a component configuration is not explicitly provided. To learn how to set component preferences in your project, see the [How to Set Preferred Components for Your Project](../how-to/core/component_preferrences.md) guide.

::: mkdocs-click
:module: ragbits.cli
Expand Down
159 changes: 159 additions & 0 deletions docs/how-to/core/component_preferrences.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# How to Set Preferred Components for Your Project

## Introduction
When you use Ragbits in your project, you can set the preferred components for different component types (like embedders, vector stores, LLMs, etc.) in the project configuration. Typically, there are many different implementations for each type of component, and each implementation has its own configuration. Ragbits allows you to choose the implementation you prefer for each type of component and the configuration to be used along with it.

In this guide, you will learn two methods of setting the preferred components for your project: [by a factory function](#by-a-factory-function) and [by a YAML configuration file](#by-a-yaml-configuration-file). Preferred components are used automatically by the [Ragbits CLI](../../cli/main.md), and you will also learn [how to use them in your own code](#using-the-preferred-components). At the end of the guide, you will find a [list of component types](#list-of-component-types) for which you can set the preferred configuration.

## Setting the Preferred Components
You can specify the component preferences in two different ways: either by providing a factory function that creates the preferred instance of the component or by providing a YAML configuration file that contains the preferred configuration.

### By a Factory Function
To set the preferred component using a factory function, you need to create a function that takes no arguments and returns an instance of the component. You then set the full Python path to this function in the `[tool.ragbits.core.component_preference_factories]` section of your project's `pyproject.toml` file.

For example, to designate `QdrantVectorStore` (with an in-memory `AsyncQdrantClient`) as the preferred vector store implementation, you can create a factory function like this:

```python
from ragbits.core.vector_stores import QdrantVectorStore
from qdrant_client import AsyncQdrantClient

def get_qdrant_vector_store():
return QdrantVectorStore(
client=AsyncQdrantClient(location=":memory:"),
index_name="my_index",
)
```

Then, you set the full Python path to this function in the `[tool.ragbits.core.component_preference_factories]` section of your project's `pyproject.toml` file:

```toml
[tool.ragbits.core.component_preference_factories]
vector_store = "my_project.get_qdrant_vector_store"
```

The key `vector_store` is the name of the component type for which you are setting the preferred configuration. To see all possible component types, refer to the [List of Component Types](#list-of-component-types) section below. The `[tool.ragbits.core.component_preference_factories]` may contain multiple keys, each corresponding to a different component type. For example:

```toml
[tool.ragbits.core.component_preference_factories]
vector_store = "my_project.get_qdrant_vector_store"
embedder = "my_project.get_litellm_embedder"
```

<a name="llm-configuration"></a>
!!! info "LLM Specific Configuration"
Ragbits can distinguish between LLMs, depending on their capabilities. You can use a special `[tool.ragbits.core.llm_preference_factories]` section in your `pyproject.toml` file to set the preferred LLM factory functions for different types of LLMs. For example:

```toml
[tool.ragbits.core.llm_preference_factories]
text = "my_project.get_text_llm"
vision = "my_project.get_vision_llm"
structured_output = "my_project.get_structured_output_llm"
```

The keys in the `[tool.ragbits.core.llm_preference_factories]` section are the names of the LLM types for which you are setting the preferred configuration. The possible LLM types are `text`, `vision`, and `structured_output`. The values are the full Python paths to the factory functions that create instances of the LLMs.

### By a YAML Configuration File
To set the preferred components using a YAML configuration file, you need to create a YAML file that contains the preferred configuration for different types of components. You then set the path to this file in the `[tool.ragbits.core]` section of your project's `pyproject.toml` file.

For example, to designate `QdrantVectorStore` (with an in-memory `AsyncQdrantClient`) as the preferred vector store implementation, you can create a YAML file like this:

```yaml
vector_store:
type: QdrantVectorStore
config:
client:
location: ":memory:"
index_name: my_index
```
Then, you set the path to this file as `component_preference_config_path` in the `[tool.ragbits.core]` section of your project's `pyproject.toml` file:

```toml
[tool.ragbits.core]
component_preference_config_path = "preferred_instances.yaml"
```

When using subclasses built into Ragbits, you can use either the name of the class alone (like the `QdrantVectorStore` in the example above) or the full Python path to the class (like `ragbits.core.vector_stores.QdrantVectorStore`). For other classes (like your own custom implementations of Ragbits components), you must use the full Python path.

In the example, the `vector_store` key is the name of the component type for which you are setting the preferred component. To see all possible component types, refer to the [List of Component Types](#list-of-component-types). The YAML configuration may contain multiple keys, each corresponding to a different component type. For example:

```yaml
vector_store:
type: QdrantVectorStore
config:
client:
location: ":memory:"
index_name: my_index
embedder:
type: LiteLLMEmbeddings
config:
model: text-embedding-3-small
```

<a name="ds-configuration"></a>
!!! info "`DocumentSearch` Specific Configuration"
While you can provide `DocumentSearch` with a preferred configuration in the same way as other components (by setting the `document_search` key in the YAML configuration file), there is also a shortcut. If you don't provide a preferred configuration for `DocumentSearch` explicitly, it will look for your project's preferences regarding all the components that `DocumentSearch` needs (like `vector_store`, `provider`, `rephraser`, `reranker`, etc.) and create a `DocumentSearch` instance with your preferred components. This way, you don't have to configure those components twice (once for `DocumentSearch` and once for the component itself).

This is an example of a YAML configuration file that sets the preferred configuration for `DocumentSearch` explicitly:

```yaml
document_search:
type: DocumentSearch
config:
embedder:
type: NoopEmbeddings
vector_store:
type: InMemoryVectorStore
```

This is an example of a YAML configuration file that sets the preferred configuration for `DocumentSearch` implicitly:

```yaml
embedder:
type: NoopEmbeddings
vector_store:
type: InMemoryVectorStore
```

In both cases, `DocumentSearch` will use `NoopEmbeddings` as the preferred embedder and `InMemoryVectorStore` as the preferred vector store.

## Using the Preferred Components
Preferred components are used automatically by the [Ragbits CLI](../../cli/main.md). The `ragbits` commands that work on components (like [`ragbits vector-store`](../../cli/main.md#ragbits-vector-store), [`ragbits document-search`](../../cli/main.md#ragbits-document-search), etc.) will use the component preferred for the given type unless instructed otherwise.

You can also retrieve preferred components in your own code by instantiating the component using the `preferred_subclass()` factory method of [the base class of the given component type](#list-of-component-types). This method will automatically create an instance of the preferred implementation of the component with the configuration you have set.

For example, the code below will create an instance of the default vector store implementation with the default configuration (as long as you have [set the default vector store in the project configuration](#how-to-set-preferred-components-for-your-project)):

```python
from ragbits.core.vector_stores import VectorStore
vector_store = VectorStore.preferred_subclass()
```

Note that `VectorStore` itself is an abstract class, so the instance created by `preferred_subclass()` will be an instance of one of the concrete subclasses of `VectorStore` that you have set as the preferred in the project configuration.

<a name="llm-usage"></a>
!!! note "LLM Specific Usage"
If you [set the preferred LLM factory functions](#llm-configuration) in the project configuration, you can use the `get_preferred_llm()` function to create an instance of the preferred LLM for a given type. For example:

```python
from ragbits.core.llms.factory import get_preferred_llm, LLMType
text_llm = get_preferred_llm(LLMType.TEXT) # one of: TEXT, VISION, STRUCTURED_OUTPUT
```

### List of Component Types
This is the list of component types for which you can set a preferred configuration:

| Key | Package | Base class | Notes |
|----------------------|---------------------------|---------------------------------------------------|----------------------------------------------|
| `embedder` | `ragbits-core` | [`Embeddings`][ragbits.core.embeddings.Embeddings]| |
| `llm` | `ragbits-core` | [`LLM`][ragbits.core.llms.LLM] | Specifics: [Configuration](#llm-configuration), [Usage](#llm-usage)|
| `metadata_store` | `ragbits-core` | [`MetadataStore`][ragbits.core.metadata_stores.base.MetadataStore]| |
| `vector_store` | `ragbits-core` | [`VectorStore`][ragbits.core.vector_stores.base.VectorStore]| |
| `history_compressor` | `ragbits-conversations` | [`ConversationHistoryCompressor`][ragbits.conversations.history.compressors.base.ConversationHistoryCompressor]| |
| `document_search` | `ragbits-document-search` | [`DocumentSearch`][ragbits.document_search.DocumentSearch]| Specifics: [Configuration](#ds-configuration)|
| `provider` | `ragbits-document-search` | [`BaseProvider`][ragbits.document_search.ingestion.providers.base.BaseProvider]| |
| `rephraser` | `ragbits-document-search` | [`QueryRephraser`][ragbits.document_search.retrieval.rephrasers.QueryRephraser]| |
| `reranker` | `ragbits-document-search` | [`Reranker`][ragbits.document_search.retrieval.rerankers.base.Reranker]| |
3 changes: 1 addition & 2 deletions docs/how-to/core/prompts_lab.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,10 @@ To work with a specific prompt, select it from the list. The "Inputs" pane allow
Then, click "Render prompt" to view the final prompt content, with all placeholders replaced with the values you provided. To check how the Large Language Model responds to the prompt, click "Send to LLM".

!!! note
If there is no default LLM configured for your project, Prompts Lab will use OpenAI's gpt-3.5-turbo. Ensure that the OPENAI_API_KEY environment variable is set and contains your OpenAI API key.
If there is no [preferred LLM configured for your project](../core/component_preferrences.md), Prompts Lab will use OpenAI's gpt-3.5-turbo. Ensure that the OPENAI_API_KEY environment variable is set and contains your OpenAI API key.

Alternatively, you can use your own custom LLM factory (a function that creates an instance of [ragbit's LLM class][ragbits.core.llms.LLM]) by specifying the path to the factory function using the `--llm-factory` option with the `ragbits prompts lab` command.

<!-- TODO: link to the how-to on configuring default LLMs in pyproject.toml -->

## Conclusion

Expand Down
6 changes: 2 additions & 4 deletions docs/quickstart/quickstart1_prompts.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,13 @@ Where `path.within.your.project` is the path to the Python module where the prom
ragbits prompts exec song_prompt:SongPrompt
```

This command will send the prompt to the default Large Language Model and display the generated response in the terminal.
This command will send the prompt to [the project's preffered LLM implementation](../how-to/core/component_preferrences.md) and display the generated response in the terminal.

!!! note
If there is no default LLM configured for your project, Ragbits will use OpenAI's gpt-3.5-turbo. Ensure that the `OPENAI_API_KEY` environment variable is set and contains your OpenAI API key.
If there is no preferred LLM configured for your project, Ragbits will use OpenAI's gpt-3.5-turbo. Ensure that the `OPENAI_API_KEY` environment variable is set and contains your OpenAI API key.

Alternatively, you can use your custom LLM factory (a function that creates an instance of [Ragbits's LLM class][ragbits.core.llms.LLM]) by specifying the path to the factory function using the `--llm-factory` option with the `ragbits prompts exec` command.

<!-- TODO: link to the how-to on configuring default LLMs in pyproject.toml -->

## Using the Prompt in Python Code
To use the defined prompt with a Large Language Model in Python, you need to create an instance of the model and pass the prompt to it. For instance:

Expand Down
2 changes: 1 addition & 1 deletion examples/document-search/qdrant.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ async def main() -> None:
model="text-embedding-3-small",
)
vector_store = QdrantVectorStore(
client=AsyncQdrantClient(":memory:"),
client=AsyncQdrantClient(location=":memory:"),
index_name="jokes",
)
document_search = DocumentSearch(
Expand Down
5 changes: 5 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ nav:
- How-to Guides:
- Core:
- how-to/core/use_prompting.md
- how-to/core/component_preferrences.md
- how-to/core/prompts_lab.md
- how-to/core/promptfoo.md
- how-to/core/use_tracing.md
Expand Down Expand Up @@ -39,12 +40,16 @@ nav:
- api_reference/core/llms.md
- api_reference/core/embeddings.md
- api_reference/core/vector-stores.md
- api_reference/core/metadata-stores.md
- Document Search:
- api_reference/document_search/index.md
- api_reference/document_search/documents.md
- Ingestion:
- api_reference/document_search/processing.md
- api_reference/document_search/execution_strategies.md
- Retrieval:
- api_reference/document_search/retrieval/rephrasers.md
- api_reference/document_search/retrieval/rerankers.md
- Conversations:
- api_reference/conversations/compressors/base.md
- api_reference/conversations/compressors/llm.md
Expand Down
15 changes: 8 additions & 7 deletions packages/ragbits-cli/src/ragbits/cli/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from rich.console import Console

from ragbits.core.config import CoreConfig, core_config
from ragbits.core.utils.config_handling import InvalidConfigError, NoDefaultConfigError, WithConstructionConfig
from ragbits.core.utils.config_handling import InvalidConfigError, NoPreferredConfigError, WithConstructionConfig

WithConstructionConfigT_co = TypeVar("WithConstructionConfigT_co", bound=WithConstructionConfig, covariant=True)

Expand All @@ -16,8 +16,8 @@
# to be used as types: https://github.com/python/mypy/issues/4717
class WithConstructionConfigProtocol(Protocol[WithConstructionConfigT_co]):
@classmethod
def subclass_from_defaults(
cls, defaults: CoreConfig, factory_path_override: str | None = None, yaml_path_override: Path | None = None
def preferred_subclass(
cls, config: CoreConfig, factory_path_override: str | None = None, yaml_path_override: Path | None = None
) -> WithConstructionConfigT_co: ...


Expand All @@ -30,7 +30,7 @@ def get_instance_or_exit(
factory_path_argument_name: str = "--factory-path",
) -> WithConstructionConfigT_co:
"""
Returns an instance of the provided class, initialized using its `subclass_from_defaults` method.
Returns an instance of the provided class, initialized using its `preferred_subclass` method.
If the instance can't be created, prints an error message and exits the program.
Args:
Expand All @@ -46,18 +46,19 @@ def get_instance_or_exit(

type_name = type_name or to_snake(cls.__name__).replace("_", " ")
try:
return cls.subclass_from_defaults(
return cls.preferred_subclass(
core_config,
factory_path_override=factory_path,
yaml_path_override=yaml_path,
)
except NoDefaultConfigError as e:
except NoPreferredConfigError as e:
Console(
stderr=True
).print(f"""You need to provide the [b]{type_name}[/b] instance to be used. You can do this by either:
- providing a path to a YAML configuration file with the [b]{yaml_path_argument_name}[/b] option
- providing a Python path to a function that creates a vector store with the [b]{factory_path_argument_name}[/b] option
- setting the default configuration or factory function in your project's [b]pyproject.toml[/b] file""")
- setting the preferred {type_name} configuration in your project's [b]pyproject.toml[/b] file
(see https://ragbits.deepsense.ai/how-to/core/component_preferrences/ for more information)""")
raise typer.Exit(1) from e
except InvalidConfigError as e:
Console(stderr=True).print(e)
Expand Down
Loading

0 comments on commit 96e011f

Please sign in to comment.