From 04d9683f49ed2a16832924d062ce800f3b4037cb Mon Sep 17 00:00:00 2001 From: Jay Bryant Date: Thu, 9 Nov 2023 11:33:32 -0600 Subject: [PATCH] Editing pass for clarity, grammar, spelling, punctuation, and usage. Also added several anchors and links. --- .../modules/ROOT/pages/api/aiclient.adoc | 81 +++++++------ .../modules/ROOT/pages/api/vectordbs.adoc | 114 +++++++++--------- .../antora/modules/ROOT/pages/concepts.adoc | 109 ++++++++--------- .../modules/ROOT/pages/getting-started.adoc | 41 +++---- .../antora/modules/ROOT/pages/glossary.adoc | 4 +- .../main/antora/modules/ROOT/pages/index.adoc | 21 ++-- .../pages/providers/huggingface/index.adoc | 19 ++- 7 files changed, 195 insertions(+), 194 deletions(-) diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/aiclient.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/aiclient.adoc index f34bb556f01..bf8c1bf9ca9 100644 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/aiclient.adoc +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/aiclient.adoc @@ -1,20 +1,18 @@ +[[AiClient]] = AiClient -== Overview - -The AiClient interface streamlines interactions with xref:concepts.adoc#_models[AI Models]. -It simplifies connecting to various AI Models — each with potentially unique APIs — by offering a uniform interface for interaction. +The `AiClient` interface streamlines interactions with xref:concepts.adoc#_models[AI Models]. +It simplifies connecting to various AI Models -— each with potentially unique APIs -— by offering a uniform interface for interaction. Currently, the interface supports only text-based input and output. -You should expect some of the classes and interfaces to change as we support for other input and output types is implemented. +You should expect some of the classes and interfaces to change as we add other input and output types. -The design of the AiClient interface centers around two primary goals: +The design of the `AiClient` interface centers around two primary goals: -1. *Portability*: It allows easy integration with different AI Models, allowing developers to switch between differing AI models with minimal code changes. +* *Portability*: It allows easy integration with different AI Models, letting developers switch between differing AI models with minimal code changes. This design aligns with Spring's philosophy of modularity and interchangeability. - -2. *Simplicity*: Using companion classes like `Prompt` for input encapsulation and `AiResponse` for output handling, the `AiClient` interface simplifies communication with AI Models. It manages the complexity of request preparation and response parsing, offering a direct and simplified API interaction. +* *Simplicity*: By using companion classes like `Prompt` for input encapsulation and `AiResponse` for output handling, the `AiClient` interface simplifies communication with AI Models. It manages the complexity of request preparation and response parsing, offering a direct and simplified API interaction. == API Overview @@ -36,13 +34,11 @@ public interface AiClient { The `generate` method with a `String` parameter simplifies initial use, avoiding the complexities of the more sophisticated `Prompt` and `AiResponse` classes. - === Prompt -In a real-world application, it will be most common to use the generate method, taking a `Prompt` instance and returning an `AiResponse`. +In a real-world application, it is most common to use the `generate` method, taking a `Prompt` instance and returning an `AiResponse`. The `Prompt` class encapsulates a list of `Message` objects. -Below is a truncated version of the Prompt class, excluding constructors and other utility methods: - +The following listing shows a truncated version of the Prompt class, excluding constructors and other utility methods: ```java public class Prompt { @@ -57,7 +53,6 @@ public class Prompt { The `Message` interface encapsulates a textual message, a collection of attributes as a `Map`, and a categorization known as `MessageType`. The interface is defined as follows: - ```java public interface Message { @@ -70,15 +65,14 @@ public interface Message { } ``` -The Message interface has various implementations corresponding to the categories of messages that an AI model can process. +The `Message` interface has various implementations that correspond to the categories of messages that an AI model can process. Some models, like OpenAI's chat completion endpoint, distinguish between message categories based on conversational roles, effectively mapped by the `MessageType`. - -For instance, OpenAI recognizes message categories for distinct conversational roles such as "system", "user", or "assistant". -While the term MessageType might imply a specific message format, in this context, it effectively designates the role a message plays in the dialogue. +For instance, OpenAI recognizes message categories for distinct conversational roles such as "`system,`" "`user,`" or "`assistant.`" +While the term, `MessageType`, might imply a specific message format, in this context, it effectively designates the role a message plays in the dialogue. For AI models that do not use specific roles, the `UserMessage` implementation acts as a standard category, typically representing user-generated inquiries or instructions. -To understand the practical application and the relationship between Prompt and Message, especially in the context of these roles or message categories, please refer to the detailed explanations in the Prompts section. +To understand the practical application and the relationship between `Prompt` and `Message`, especially in the context of these roles or message categories, see the detailed explanations in the <> section. === AiResponse @@ -95,11 +89,11 @@ public class AiResponse { The `AiResponse` class holds the AI Model's output, with each `Generation` instance containing one of potentially multiple outputs from a single prompt. -The `AiResponse` class additionally carries a map of key-value pairs providing metadata about the AI Model's response. This feature is still in progress and is not elaborated on in this document. +The `AiResponse` class also carries a map of key-value pairs providing metadata about the AI Model's response. This feature is still in progress and is not elaborated on in this document. === Generation -Finally, the `Generation` class contains a String representing the output text and a map that provides metadata about this response. +Finally, the `Generation` class contains a `String` that represents the output text and a map that provides metadata about this response: ```java @@ -114,30 +108,38 @@ public class Generation { == Available Implementations -These are the available implementations of the `AiClient` interface +The `AiClient` interface has the following available implementations: -* OpenAI - Using the https://github.com/TheoKanning/openai-java[Theo Kanning client library]. -* Azure OpenAI - Using https://learn.microsoft.com/en-us/java/api/overview/azure/ai-openai-readme?view=azure-java-preview[Microsoft's OpenAI client library]. -* Hugging Face - Using the https://huggingface.co/inference-endpoints[Hugging Face Hosted Inference Service]. This gives you access to hundreds of models. -* https://ollama.ai/[Ollama] - Run large language models, locally. +* OpenAI: Using the https://github.com/TheoKanning/openai-java[Theo Kanning client library]. +* Azure OpenAI: Using https://learn.microsoft.com/en-us/java/api/overview/azure/ai-openai-readme?view=azure-java-preview[Microsoft's OpenAI client library]. +* Hugging Face: Using the https://huggingface.co/inference-endpoints[Hugging Face Hosted Inference Service]. This gives you access to hundreds of models. +* https://ollama.ai/[Ollama]: Run large language models locally. Planned implementations -* Amazon Bedrock - This can provide access to many AI models. -* Google Vertex - Providing access to 'Bard', aka Palm2 +* Amazon Bedrock: This can provide access to many AI models. +* Google Vertex: Providing access to 'Bard' (AKA Palm2). -Others are welcome, the list is not at all closed. +Others are welcome. The list is not at all closed. == OpenAI-Compatible Models A variety of models compatible with the OpenAI API are available, including those that can be operated locally, such as [LocalAI](https://github.com/mudler/LocalAI). The standard configuration for connecting to the OpenAI API is through the `spring.ai.openai.baseUrl` property, which defaults to `https://api.openai.com`. -To link the OpenAI client to a compatible model that utilizes the OpenAI API, you should adjust the `spring.ai.openai.baseUrl` property to the corresponding URL of the model you wish to connect to. +To link the OpenAI client to a compatible model that uses the OpenAI API, you should adjust the `spring.ai.openai.baseUrl` property to the corresponding URL of the model you wish to connect to. == Configuration +This section describes how to configure models, including: + +* <> +* <> +* <> +* <> + +[[openan-api]] === OpenAI -Add the Spring Boot starter to you project's dependencies +Add the Spring Boot starter to you project's dependencies: [source, xml] ---- @@ -148,7 +150,7 @@ Add the Spring Boot starter to you project's dependencies ---- -This will make an instance of the `AiClient` that is backed by the https://github.com/TheoKanning/openai-java[Theo Kanning client library] available for injection in your application classes. +This makes an instance of the `AiClient` that is backed by the https://github.com/TheoKanning/openai-java[Theo Kanning client library] available for injection in your application classes. The Spring AI project defines a configuration property named `spring.ai.openai.api-key` that you should set to the value of the `API Key` obtained from `openai.com`. @@ -159,12 +161,13 @@ Exporting an environment variable is one way to set that configuration property. export SPRING_AI_OPENAI_API_KEY= ---- +[[azure-openai-api]] === Azure OpenAI -This will make an instance of the `AiClient` that is backed by the https://learn.microsoft.com/en-us/java/api/overview/azure/ai-openai-readme?view=azure-java-preview[Microsoft's OpenAI client library] available for injection in your application classes. +This makes an instance of the `AiClient` that is backed by https://learn.microsoft.com/en-us/java/api/overview/azure/ai-openai-readme?view=azure-java-preview[Microsoft's OpenAI client library] available for injection in your application classes. The Spring AI project defines a configuration property named `spring.ai.azure.openai.api-key` that you should set to the value of the `API Key` obtained from Azure. -There is also a configuraiton property named `spring.ai.azure.openai.endpoint` that you should set to the endpoint URL obtained when provisioning your model in Azure. +There is also a configuration property named `spring.ai.azure.openai.endpoint` that you should set to the endpoint URL obtained when provisioning your model in Azure. Exporting environment variables is one way to set these configuration properties. @@ -174,9 +177,10 @@ export SPRING_AI_AZURE_OPENAI_API_KEY= export SPRING_AI_AZURE_OPENAI_ENDPOINT= ---- +[[hugging-face-api]] === Hugging Face -There is not yet a Spring Boot Starter for this client implementation, so you should add the dependency to the HuggingFace client implementation to your project's dependencies. +There is not yet a Spring Boot Starter for this client implementation, so you should add the dependency to the HuggingFace client implementation to your project's dependencies and export an environment variable: [source, xml] ---- @@ -192,11 +196,12 @@ There is not yet a Spring Boot Starter for this client implementation, so you sh export HUGGINGFACE_API_KEY=your_api_key_here ---- -Obtain the endpoint URL of the Inference Endpoint. You can find this on the Inference Endpoint's UI https://ui.endpoints.huggingface.co/[here]. +Obtain the endpoint URL of the inference endpoint. You can find this on the Inference Endpoint's UI https://ui.endpoints.huggingface.co/[here]. +[[ollama-api]] === Ollama -There is not yet a Spring Boot Starter for this client implementation, so you should add the dependency to the Ollama client implementation to your project's dependencies. +There is not yet a Spring Boot Starter for this client implementation, so you should add the dependency to the Ollama client implementation to your project's dependencies: [source, xml] ---- @@ -209,7 +214,7 @@ There is not yet a Spring Boot Starter for this client implementation, so you sh == Example Usage -A simple hello world example is shown below that uses the `AiClient's generate method that takes a `String` as input and returns a `String` as output. +The following listing shows a simple "Hello, world" example. It uses the `AiClient.generate` method that takes a `String` as input and returns a `String` as output: [source,java] ---- diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs.adoc index 9fba140c5af..f4922b7f7a5 100644 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs.adoc +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs.adoc @@ -1,22 +1,22 @@ +[[vector-databases]] = Vector Databases -== Introduction -Vector databases are a specialized type of database that plays an essential role in AI applications. +A vector databases is a specialized type of database that plays an essential role in AI applications. In vector databases, queries differ from traditional relational databases. Instead of exact matches, they perform similarity searches. -When given a vector as a query, a vector database returns vectors that are "similar" to the query vector. -Further details on how this similarity is calculated at a high-level is provided in a later section. +When given a vector as a query, a vector database returns vectors that are "`similar`" to the query vector. +Further details on how this similarity is calculated at a high-level is provided in a <>. Vector databases are used to integrate your data with AI models. The first step in their usage is to load your data into a vector database. -Then, when a user query is to be sent to the AI model, a set of similar documents is retrieved first. -These documents then serve as the context for the user's question and are sent to the AI model along with the user's query. -This technique is known as Retrieval Augmented Generation. +Then, when a user query is to be sent to the AI model, a set of similar documents is first retrieved. +These documents then serve as the context for the user's question and are sent to the AI model, along with the user's query. +This technique is known as <>. -In the following sections, we will describe the Spring AI interface for using multiple vector database implementations and some high-level sample usage. +The following sections describe the Spring AI interface for using multiple vector database implementations and some high-level sample usage. -The last section attempts to demystify the underlying approach of similarity search of vector databases. +The last section attempts to demystify the underlying approach of similarity searching in vector databases. == API Overview This section serves as a guide to the `VectorStore` interface and its associated classes within the Spring AI framework. @@ -47,48 +47,50 @@ public interface VectorStore { ``` To insert data into the vector database, encapsulate it within a `Document` object. -The `Document` class encapsulates content from a data source, such as a PDF or Word document, and includes text represented as a String. -It also contains metadata in the form of key-value pairs, including details like the filename. +The `Document` class encapsulates content from a data source, such as a PDF or Word document, and includes text represented as a string. +It also contains metadata in the form of key-value pairs, including details such as the filename. -Upon insertion into the vector database, the text content is transformed into a numerical array, or a `List`, known as vector embeddings, using an Embedding model. Embedding models like https://en.wikipedia.org/wiki/Word2vec[Word2Vec], https://en.wikipedia.org/wiki/GloVe_(machine_learning)[GLoVE], and https://en.wikipedia.org/wiki/BERT_(language_model)[BERT], or OpenAI's `text-embedding-ada-002` model are used to convert words, sentences, or paragraphs into these vector embeddings. +Upon insertion into the vector database, the text content is transformed into a numerical array, or a `List`, known as vector embeddings, using an embedding model. Embedding models, such as https://en.wikipedia.org/wiki/Word2vec[Word2Vec], https://en.wikipedia.org/wiki/GloVe_(machine_learning)[GLoVE], and https://en.wikipedia.org/wiki/BERT_(language_model)[BERT], or OpenAI's `text-embedding-ada-002`, are used to convert words, sentences, or paragraphs into these vector embeddings. -The vector database's role is to store and facilitate similarity searches for these embeddings; it does not generate the embeddings itself. For creating vector embeddings, the `EmbeddingClient` should be utilized. +The vector database's role is to store and facilitate similarity searches for these embeddings. It does not generate the embeddings itself. For creating vector embeddings, the `EmbeddingClient` should be utilized. -The `similaritySearch` methods in the interface allow for retrieving documents similar to a given query string. These methods can be fine-tuned using the following parameters: - -* k - An integer that specifies the maximum number of similar documents to return. This is often referred to as a 'top K' search, or 'K nearest neighbors' (KNN). -* threshold - A double value ranging from 0 to 1, where values closer to 1 indicate higher similarity. By default, if you set a threshold of 0.75, for instance, only documents with a similarity above this value will be returned. -* Filter.Expression - A class used for passing a Fluent DSL (Domain-Specific Language) expression that functions similarly to a 'where' clause in SQL, but it applies exclusively to the metadata key-value pairs of a Document. -* filterExpression - An external DSL based on ANTLR4 that accepts filter expressions as strings. For example, with metadata keys like country, year, and isActive, you could use an expression such as country == 'UK' && year >= 2020 && isActive == true. +The `similaritySearch` methods in the interface allow for retrieving documents similar to a given query string. These methods can be fine-tuned by using the following parameters: +* `k`: An integer that specifies the maximum number of similar documents to return. This is often referred to as a 'top K' search, or 'K nearest neighbors' (KNN). +* `threshold`: A double value ranging from 0 to 1, where values closer to 1 indicate higher similarity. By default, if you set a threshold of 0.75, for instance, only documents with a similarity above this value are returned. +* `Filter.Expression`: A class used for passing a fluent DSL (Domain-Specific Language) expression that functions similarly to a 'where' clause in SQL, but it applies exclusively to the metadata key-value pairs of a `Document`. +* `filterExpression`: An external DSL based on ANTLR4 that accepts filter expressions as strings. For example, with metadata keys like country, year, and `isActive`, you could use an expression such as +``` java +country == 'UK' && year >= 2020 && isActive == true. +``` == Available Implementations -These are the available implementations of the `VectorStore` interface +These are the available implementations of the `VectorStore` interface: * `InMemoryVectorStore` * `SimplePersistentVectorStore` -* Pinecone - https://www.pinecone.io/[PineCone] vector store. -* PgVector [`PgVectorStore`] - The https://github.com/pgvector/pgvector[PostgreSQL/PGVector] vector store. -* Milvus [`MilvusVectorStore`] - The https://milvus.io/[Milvus] vector store -* Neo4j [`Neo4jVectorStore`]- The https://neo4j.com/[Neo4j] vector store +* Pinecone: https://www.pinecone.io/[PineCone] vector store. +* PgVector [`PgVectorStore`]: The https://github.com/pgvector/pgvector[PostgreSQL/PGVector] vector store. +* Milvus [`MilvusVectorStore`]: The https://milvus.io/[Milvus] vector store +* Neo4j [`Neo4jVectorStore`]: The https://neo4j.com/[Neo4j] vector store -More implementations will be supported in future releases. +More implementations may be supported in future releases. -If you have a vector database that needs to be supported by Spring AI, please open an issue on GitHub or, even better, submit a Pull Request with an implementation. +If you have a vector database that needs to be supported by Spring AI, open an issue on GitHub or, even better, submit a pull request with an implementation. == Example Usage -To compute the embeddings for a vector database, you need to pick an Embedding model that matches the higher-level AI model being used. +To compute the embeddings for a vector database, you need to pick an embedding model that matches the higher-level AI model being used. -For example, with OpenAI's ChatGPT, we use the `OpenAiEmbeddingClient` and the model name `text-embedding-ada-002`. +For example, with OpenAI's ChatGPT, we use the `OpenAiEmbeddingClient` and a model name of `text-embedding-ada-002`. -The Spring Boot Starter's auto-configuation for OpenAI makes an implementation of `EmbeddingClient` available in the Spring Application Context for Dependency Injection. +The Spring Boot starter's auto-configuration for OpenAI makes an implementation of `EmbeddingClient` available in the Spring application context for dependency injection. The general usage of loading data into a vector store is something you would do in a batch-like job, by first loading data into Spring AI's `Document` class and then calling the `save` method. -Given a `String` reference to a source file representing a JSON file with data we want to load into the vector database, we use Spring AI's `JsonReader` to load specific fields in the JSON, which splits them up into small pieces and then passes those small pieces to the vector store implementation. -The `VectorStore` implementation computes the embeddings and stores the JSON and the embedding in the vector database. +Given a `String` reference to a source file that represents a JSON file with data we want to load into the vector database, we use Spring AI's `JsonReader` to load specific fields in the JSON, which splits them up into small pieces and then passes those small pieces to the vector store implementation. +The `VectorStore` implementation computes the embeddings and stores the JSON and the embedding in the vector database: ```java @Autowired @@ -102,34 +104,32 @@ The `VectorStore` implementation computes the embeddings and stores the JSON and } ``` -Later, when a user question is passed into the AI model, a similarity search is done to retrieve similar documents, which are then 'stuffed' into the prompt as context for the user's question. +Later, when a user question is passed into the AI model, a similarity search is done to retrieve similar documents, which are then "'stuffed'" into the prompt as context for the user's question. ```java String question = List similarDocuments = store.similaritySearch(question); ``` -There are additional options to be passed into the `similaritySearch` method that defines how many documents to retrieve and a threshold of the similarity search. +Additional options can be passed into the `similaritySearch` method to define how many documents to retrieve and a threshold of the similarity search. == Metadata Filters +This section describes various filters that you can use against the results of a query. + === Filter String -You can pass in SQL like filter expressions as String to one of the similaritySearch overloads. +You can pass in an SQL-like filter expressions as a `String` to one of the `similaritySearch` overloads. -For example +Consider the following examples: * `"country == 'BG'"` * `"genre == 'drama' && year >= 2020"` * `"genre in ['comedy', 'documentary', 'drama']"` - - - - === Filter.Expression You can create an instance of `Filter.Expression` with a `FilterExpressionbuilder` that exposes a fluent API. -A simple example is +A simple example is as follows: [source, java] ---- @@ -137,7 +137,7 @@ FilterExpressionBuilder b = new FilterExpressionBuilder(); Expression expression = b.eq("country", "BG").build(); ---- -You can build up sophisticated expressions using the operators +You can build up sophisticated expressions by using the following operators: [source, text] ---- @@ -151,7 +151,7 @@ LE: '<=' NE: '!=' ---- -You can combine expressions using +You can combine expressions by using the following operators: [source,text] ---- @@ -159,13 +159,14 @@ AND: 'AND' | 'and' | '&&'; OR: 'OR' | 'or' | '||'; ---- -For example +Considering the following example: + [source,java] ---- Expression exp = b.and(b.eq("genre", "drama"), b.gte("year", 2020)).build(); ---- -You can also use the operators +You can also use the following operators: [source,text] ---- @@ -174,6 +175,8 @@ NIN: 'NIN' | 'nin'; NOT: 'NOT' | 'not'; ---- +Consider the following example: + [source,java] ---- Expression exp = b.and(b.eq("genre", "drama"), b.gte("year", 2020)).build(); @@ -182,20 +185,21 @@ Expression exp = b.and(b.eq("genre", "drama"), b.gte("year", 2020)).build(); == Understanding Vectors Vectors have dimensionality and a direction. -For example, the picture below depicts a two-dimensional vector stem:[\vec{a}] in the cartesian coordinate system pictured as an arrow. +For example, the following image depicts a two-dimensional vector stem:[\vec{a}] in the cartesian coordinate system pictured as an arrow. image::vector_2d_coordinates.png[] -The head of the vector stem:[\vec{a}] is at the point stem:[(a_1, a_2)] +The head of the vector stem:[\vec{a}] is at the point stem:[(a_1, a_2)]. The *x* coordinate value is stem:[a_1] and the *y* coordinate value is stem:[a_2]. The coordinates are also referred to as the components of the vector. +[[vectordbs-similarity]] == Similarity Several mathematical formulas can be used to determine if two vectors are similar. One of the most intuitive to visualize and understand is cosine similarity. -Look at the following pictures that show three sets of graphs. +Consider the following images that show three sets of graphs: image::vector_similarity.png[] @@ -205,13 +209,13 @@ The vectors are considered unrelated when pointing perpendicular to each other a The angle between them, stem:[\theta], is a good measure of their similarity. How can the angle stem:[\theta] be computed? -We are all familiar with the https://en.wikipedia.org/wiki/Pythagorean_theorem#History[Pythagorean Theorem] +We are all familiar with the https://en.wikipedia.org/wiki/Pythagorean_theorem#History[Pythagorean Theorem]. image:pythagorean-triangle.png[] What about when the angle between *a* and *b* is not 90 degrees? -Enter the https://en.wikipedia.org/wiki/Law_of_cosines[Law of cosines] +Enter the https://en.wikipedia.org/wiki/Law_of_cosines[Law of cosines]. .Law of Cosines @@ -219,7 +223,7 @@ Enter the https://en.wikipedia.org/wiki/Law_of_cosines[Law of cosines] stem:[a^2 + b^2 - 2ab\cos\theta = c^2] **** -Showing this as a vector diagram +The following image shows this approach as a vector diagram: image:lawofcosines.png[] @@ -231,7 +235,7 @@ The magnitude of this vector is defined in terms of its components as: stem:[\vec{A} * \vec{A} = ||\vec{A}||^2 = A_1^2 + A_2^2 ] **** -and the dot product between two vectors stem:[\vec{A}] and stem:[\vec{B}] is defined in terms of its components as: +The dot product between two vectors stem:[\vec{A}] and stem:[\vec{B}] is defined in terms of its components as: .Dot Product @@ -239,7 +243,7 @@ and the dot product between two vectors stem:[\vec{A}] and stem:[\vec{B}] is def stem:[\vec{A} * \vec{B} = A_1B_1 + A_2B_2] **** -Rewriting the Law of Cosines with vector magnitudes and dot products gives: +Rewriting the Law of Cosines with vector magnitudes and dot products gives the following: .Law of Cosines in Vector form **** @@ -247,7 +251,7 @@ stem:[||\vec{A}||^2 + ||\vec{B}||^2 - 2||\vec{A}||||\vec{B}||\cos\theta = ||\vec **** -Replacing stem:[||\vec{C}||^2] with stem:[||\vec{B} - \vec{A}||^2] gives: +Replacing stem:[||\vec{C}||^2] with stem:[||\vec{B} - \vec{A}||^2] gives the following: .Law of Cosines in Vector form only in terms of stem:[\vec{A}] and stem:[\vec{B}] @@ -263,11 +267,11 @@ https://towardsdatascience.com/cosine-similarity-how-does-it-measure-the-similar stem:[similarity(vec{A},vec{B}) = \cos(\theta) = \frac{\vec{A}\cdot\vec{B}}{||\vec{A}\||\cdot||\vec{B}||] **** -This formula works for dimensions higher than 2 or 3, though it is hard to visualize, https://projector.tensorflow.org/[but can be done to some extent]. +This formula works for dimensions higher than 2 or 3, though it is hard to visualize. However, https://projector.tensorflow.org/[it can be visualized to some extent]. It is common for vectors in AI/ML applications to have hundreds or even thousands of dimensions. The similarity function in higher dimensions using the components of the vector is shown below. -It expands the two-dimensional definitions of Magnitude and Dot Product given previously to *N* dimensions using the https://en.wikipedia.org/wiki/Summation[Summation mathematical syntax]. +It expands the two-dimensional definitions of Magnitude and Dot Product given previously to *N* dimensions by using https://en.wikipedia.org/wiki/Summation[Summation mathematical syntax]. .Cosine Similarity with vector components **** diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/concepts.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/concepts.adoc index 7178cbc4020..95fcbdebd0d 100644 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/concepts.adoc +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/concepts.adoc @@ -1,5 +1,8 @@ +[[concepts]] = AI Concepts +This section describes core concepts that Spring AI uses. We recommend reading it closely to understand the ideas behind how Spring AI is implemented. + == Models AI models are algorithms designed to process and generate information, often mimicking human cognitive functions. @@ -9,8 +12,7 @@ There are many different types of AI models, each suited for a specific use case While ChatGPT and its generative AI capabilities have captivated users through text input and output, many models and companies offer diverse inputs and outputs. Before ChatGPT, many people were fascinated by text-to-image generation models such as Midjourney and Stable Diffusion. -The following table categorizes several models based on their input and output types. - +The following table categorizes several models based on their input and output types: [cols=3*, options=header] |=== @@ -36,47 +38,46 @@ The following table categorizes several models based on their input and output t |Text |Numbers -|Many, aka, Embeddings +|Many (AKA embeddings) |=== The initial focus of Spring AI is on models that process language input and provide language output, initially OpenAI + Azure OpenAI. -The last row in the previous table, which accepts text as input and output numbers, is more commonly known as Embedding text and represents the internal data structures used in an AI model. -Spring AI has support for Embeddings to support more advanced use cases. +The last row in the previous table, which accepts text as input and output numbers, is more commonly known as embedding text and represents the internal data structures used in an AI model. +Spring AI has support for embeddings to support more advanced use cases. What sets models like GPT apart is their pre-trained nature, as indicated by the "P" in GPT—Chat Generative Pre-Trained Transformer. -This pre-training feature transforms AI into a general developer tool that doesn't necessitate an extensive machine learning or model training background. - +This pre-training feature transforms AI into a general developer tool that does not require an extensive machine learning or model training background. == Prompts -Prompts serve as the foundation for language-based inputs that guide an AI model to produce specific outputs. +Prompts serve as the foundation for the language-based inputs that guide an AI model to produce specific outputs. For those familiar with ChatGPT, a prompt might seem like merely the text entered into a dialog box that is sent to the API. However, it encompasses much more than that. -In many AI Models, the text for the prompt is not just a simple String. +In many AI Models, the text for the prompt is not just a simple string. ChatGPT's API has multiple text inputs within a prompt, with each text input being assigned a role. -For example, there is the system role, that instructs the model how to behave and sets the context for the interaction. +For example, there is the system role, which tells the model how to behave and sets the context for the interaction. There is also the user role, which is typically the input from the user. Crafting effective prompts is both an art and a science. ChatGPT was designed for human conversations. -This is quite a departure from using something like SQL to 'ask a question'. +This is quite a departure from using something like SQL to "'ask a question.'" One must communicate with the AI model akin to conversing with another person. Such is the importance of this interaction style that the term "Prompt Engineering" has emerged as its own discipline. There is a burgeoning collection of techniques that improve the effectiveness of prompts. Investing time in crafting a prompt can drastically improve the resulting output. -Sharing prompts has become a communcal practice, and there is active academic research being done on this subject. -As an example of how counter-intuitive it can be to create effective prompt, for example contrasting with SQL, recent research paper found that one of the most effective prompts you can use starts with the phrase, "Take a deep breath and work on this problem step by step". -That should give you an indication of how language is so important. -We don't yet fully understand how to make the most effective use of previous iterations of this technology, such as ChatGPT 3.5, let alone new versions that are being developed. +Sharing prompts has become a communal practice, and there is active academic research being done on this subject. +As an example of how counter-intuitive it can be to create an effective prompt (for example, contrasting with SQL), a https://URLhere[recent research paper] found that one of the most effective prompts you can use starts with the phrase, "`Take a deep breath and work on this problem step by step.`" +That should give you an indication of why language is so important. +We do not yet fully understand how to make the most effective use of previous iterations of this technology, such as ChatGPT 3.5, let alone new versions that are being developed. == Prompt Templates Creating effective prompts involves establishing the context of the request and substituting parts of the request with values specific to the user's input. -This process utilizes traditional text-based Template engines for prompt creation and management. +This process uses traditional text-based template engines for prompt creation and management. Spring AI employs the OSS library, StringTemplate, for this purpose. For instance, consider the simple prompt template: @@ -85,39 +86,38 @@ For instance, consider the simple prompt template: Tell me a {adjective} joke about {content}. ``` -In Spring AI, Prompt Templates can be likened to the 'View' in Spring MVC architecture. +In Spring AI, prompt templates can be likened to the "'View'" in Spring MVC architecture. A model object, typically a `java.util.Map`, is provided to populate placeholders within the template. -The 'rendered' string becomes the content of the Prompt supplied to the AI model. +The "'rendered'" string becomes the content of the prompt supplied to the AI model. -There is considerable variability in the specific data format of the Prompt sent to the model. +There is considerable variability in the specific data format of the prompt sent to the model. Initially starting as simple strings, prompts have evolved to include multiple messages, where each string in each message represents a distinct role for the model. - == Tokens Tokens serve as the building blocks of how an AI model works. -On input, Models convert words to tokens, and on output, they convert tokens back to words. +On input, models convert words to tokens. On output, they convert tokens back to words. In English, one token roughly corresponds to 75% of a word. For reference, Shakespeare's complete works, totaling around 900,000 words, translates to approximately 1.2 million tokens. Perhaps more important is that Tokens = *`$`*. -In the context of hosted AI models, your charges are determined by the number of tokens utilized. Both input and output contribute to the overall token count. +In the context of hosted AI models, your charges are determined by the number of tokens used. Both input and output contribute to the overall token count. Also, models are subject to token limits, which restrict the amount of text processed in a single API call. -This threshold is often referred to as the 'context window'. The model won't process any text exceeding this limit. +This threshold is often referred to as the 'context window'. The model does not process any text that exceeds this limit. For instance, ChatGPT3 has a 4K token limit, while GPT4 offers varying options, such as 8K, 16K, and 32K. Anthropic's Claude AI model features a 100K token limit, and Meta's recent research yielded a 1M token limit model. To summarize the collected works of Shakespeare with GPT4, you need to devise software engineering strategies to chop up the data and present the data within the model's context window limits. -This is an area that the Spring AI project helps you with. +The Spring AI project helps you with this task. == Output Parsing The output of AI models traditionally arrives as a `java.util.String`, even if you ask for the reply to be in JSON. -It may be the correct JSON, but it isn't a JSON data structure. It is just a string. -Also, asking "for JSON" as part of the prompt isn't 100% accurate. +It may be the correct JSON, but it is not a JSON data structure. It is just a string. +Also, asking "`for JSON`" as part of the prompt is not 100% accurate. This intricacy has led to the emergence of a specialized field involving the creation of prompts to yield the intended output, followed by parsing the resulting simple string into a usable data structure for application integration. @@ -127,58 +127,58 @@ This challenge has prompted OpenAI to introduce 'OpenAI Functions' as a means to == Chaining Calls -A Chain is a concept that represents a series of calls to an AI model. +A chain is a concept that represents a series of calls to an AI model. It uses the output from one call as the input to another. By chaining calls together, you can support complex use cases by composing pipelines of multiple chains. == Bringing Your Data to the AI model -How can you equip the AI model with information it hasn't been trained on? +How can you equip the AI model with information on which it has not been trained? -It's important to note that the GPT 3.5/4.0 dataset extends only until September 2021. -Consequently, the model will say that it doesn't know the answer to questions that require knowledge beyond that date. -An interesting bit of trivia is that this dataset is around ~650GB. +Note that the GPT 3.5/4.0 dataset extends only until September 2021. +Consequently, the model says that it does not know the answer to questions that require knowledge beyond that date. +An interesting bit of trivia is that this dataset is around 650GB. Two techniques exist for customizing the AI model to incorporate your data: -1. Fine Tuning: This traditional Machine Learning technique involves tailoring the model and changing its internal weighting. -However, it's a challenging process for Machine Learning experts and extremely resource-intensive for models like GPT due to their size. Additionally, some models might not offer this option. +* Fine Tuning: This traditional machine learning technique involves tailoring the model and changing its internal weighting. +However, it is a challenging process for machine learning experts and extremely resource-intensive for models like GPT due to their size. Additionally, some models might not offer this option. -2. Prompt Stuffing: A more practical alternative involves embedding your data within the prompt provided to the model. Given a model's token limits, techniques are required to present relevant data within the model's context window. -This approach is colloquially referred to as 'stuffing the prompt'. +* Prompt Stuffing: A more practical alternative involves embedding your data within the prompt provided to the model. Given a model's token limits, techniques are required to present relevant data within the model's context window. +This approach is colloquially referred to as "'stuffing the prompt.'" -The Spring AI library helps you implement solutions based on the 'stuffing of the prompt' technique otherwise knowsn as Retrieval Augmented Generation +The Spring AI library helps you implement solutions based on the "'stuffing the prompt'" technique otherwise known as Retrieval Augmented Generation (RAG). +[[concept-rag]] == Retrieval Augmented Generation A technique termed Retrieval Augmented Generation (RAG) has emerged to address the challenge of incorporating relevant data into prompts for accurate AI model responses. -The approach involves a batch processing style programming model, where the job reads unstructured data from your Documents, transforms it, and then writes it into a Vector Database. +The approach involves a batch processing style programming model, where the job reads unstructured data from your documents, transforms it, and then writes it into a vector database. At a high level, this is an ETL (Extract, Transform and Load) pipeline. -The Vector Database will be used in the retrieval part of RAG technique. +The vector database is used in the retrieval part of RAG technique. -As part of loading the unstructured data into the Vector Database, one of the most important transformations is to split up the original document into smaller pieces. -The procedure of splitting up the original document into smaller pieces has two important steps. +As part of loading the unstructured data into the vector database, one of the most important transformations is to split up the original document into smaller pieces. +The procedure of splitting up the original document into smaller pieces has two important steps: -1. Split up the document into parts while preserving the semantic boundaries of the content. +. Split up the document into parts while preserving the semantic boundaries of the content. For example, for a document with paragraphs and tables, one should avoid splitting the document in the middle of a paragraph or table. For code, avoid splitting the code in the middle of a method's implementation. -2. Split up the document's parts further into parts whose size is a small percentage of the AI Model's token limit. +. Split up the document's parts further into parts whose size is a small percentage of the AI Model's token limit. -The next phase in RAG, is processing user input. -When a user's question is to be answered by an AI model, the question along with all the 'similar' document pieces are placed into the prompt that is sent to the AI model. -This is the reason to use a Vector Database, it is very good at finding 'similar' content. +The next phase in RAG is processing user input. +When a user's question is to be answered by an AI model, the question and all the "'similar'" document pieces are placed into the prompt that is sent to the AI model. +This is the reason to use a vector database. It is very good at finding 'similar' content. There are several concepts that are used in implementing RAG. -The concepts map onto classes in Spring AI. -These are briefly described below +The concepts map onto classes in Spring AI: -* `DocumentReader` This is an Java functional interface that is responsible for loading a `List` from a data source. Common data sources are PDF, Markdown, and JSON. -* `Document` A text based representation of your data source that also contains metadata to describe the contents. -* `DocumentTransformer` This is responsible for processing the data in various ways, for example splitting up documents into smaller pieces or adding additional metadata to the `Document`. -* `DocumentWriter` This allows you to persist the Documents into a database, most commomly in the AI stack, a Vector Database. -* `Embedding` This is a representation of your data as a `List` that is used by the Vector Database to compute the 'similarity' of a user's query to relevant documents. +* `DocumentReader`: A Java functional interface that is responsible for loading a `List` from a data source. Common data sources are PDF, Markdown, and JSON. +* `Document`: A text-based representation of your data source that also contains metadata to describe the contents. +* `DocumentTransformer`: Responsible for processing the data in various ways (for example, splitting up documents into smaller pieces or adding additional metadata to the `Document`.) +* `DocumentWriter`: Lets you persist the Documents into a database (most commonly in the AI stack, a vector database). +* `Embedding`: A representation of your data as a `List` that is used by the vector database to compute the "'similarity'" of a user's query to relevant documents. == Evaluating AI responses @@ -189,9 +189,6 @@ This evaluation process involves analyzing whether the generated response aligns One approach involves presenting both the user's request and the AI model's response to the model, querying whether the response aligns with the provided data. -Furthermore, leveraging the information stored in the Vector Database as supplementary data can enhance the evaluation process, aiding in the determination of response relevance. +Furthermore, leveraging the information stored in the vector database as supplementary data can enhance the evaluation process, aiding in the determination of response relevance. The Spring AI project currently provides some very basic examples of how you can evaluate the responses in the form of prompts to include in a JUnit test. - - - diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/getting-started.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/getting-started.adoc index a1f2ec8731e..a99a7a468e9 100644 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/getting-started.adoc +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/getting-started.adoc @@ -1,14 +1,19 @@ +[[getting-started]] = Getting Started +This section offers quick guidance on how to get started with using Spring AI. + == API Keys +To use OpenAI or Azure OpenAI, you need to generate an API key. + === OpenAI -Create an account at link:https://platform.openai.com/signup[OpenAI Signup] and generate the token at link:https://platform.openai.com/account/api-keys[API Keys]. +Create an account at https://platform.openai.com/signup[OpenAI signup page] and generate the token on the https://platform.openai.com/account/api-keys[API Keys page]. -The Spring AI project defines a configuration property named `spring.ai.openai.api-key` that you should set to the value of the `API Key` obtained from `openai.com`. +The Spring AI project defines a configuration property named `spring.ai.openai.api-key` that you should set to the value of the `API Key` obtained from openai.com. -Exporting an environment variable is one way to set that configuration property. +Exporting an environment variable is one way to set that configuration property: [source,shell] ---- @@ -16,12 +21,12 @@ export SPRING_AI_OPENAI_API_KEY= ---- === Azure OpenAI -Obtain your Azure OpenAI `endpoint` and `api-key` from the Azure OpenAI Service section on link:https://portal.azure.com[Azure Portal]. +Obtain your Azure OpenAI `endpoint` and `api-key` from the Azure OpenAI Service section on the link:https://portal.azure.com[Azure Portal]. The Spring AI project defines a configuration property named `spring.ai.azure.openai.api-key` that you should set to the value of the `API Key` obtained from Azure. -There is also a configuraiton property named `spring.ai.azure.openai.endpoint` that you should set to the endpoint URL obtained when provisioning your model in Azure. +There is also a configuration property named `spring.ai.azure.openai.endpoint` that you should set to the endpoint URL obtained when provisioning your model in Azure. -Exporting environment variables is one way to set these configuration properties. +Exporting environment variables is one way to set these configuration properties: [source,shell] ---- @@ -32,8 +37,8 @@ export SPRING_AI_AZURE_OPENAI_ENDPOINT= == Dependencies The Spring AI project provides artifacts in the Spring Milestone Repository. -You will need to add configuration to add a reference to the Spring Milestone repository in your build file. -For example, in maven, add the following repository definition. +You need to add configuration to add a reference to the Spring Milestone repository in your build file. +For example, in Maven, add the following repository definition: [source,xml] ---- @@ -49,7 +54,7 @@ For example, in maven, add the following repository definition. ---- -Add the Spring Boot Starter depending on if you are using Azure Open AI or Open AI. +Add the Spring Boot Starter, depending on whether you use Azure Open AI or Open AI: * Azure OpenAI [source, xml] @@ -62,7 +67,6 @@ Add the Spring Boot Starter depending on if you are using Azure Open AI or Open ---- * OpenAI - [source, xml] ---- @@ -74,13 +78,12 @@ Add the Spring Boot Starter depending on if you are using Azure Open AI or Open == Spring CLI -The Spring CLI makes it easy to create new applications with code in your terminal window. Think of it as the 'create-react-app' of Spring for those familiar with the JavaScript ecosystem. +The Spring CLI makes it easy to create new applications with code in your terminal window. Think of it as the 'create-react-app' of Spring for those familiar with the JavaScript ecosystem. Download the latest https://github.com/spring-projects-experimental/spring-cli/releases[Spring CLI Release] - and follow the https://docs.spring.io/spring-cli/reference/installation.html#_setting_up_your_path_or_alias[instructions] to add `spring` to your `PATH`. -Create a simple AI application +Create a simple AI application: * For OpenAI @@ -88,38 +91,32 @@ Create a simple AI application spring boot new ai ``` -or - * For Azure OpenAI ```shell spring boot new ai-azure ``` -You can also `ADD` the same simple AI application to your current project using +You can also add the same simple AI application to your current project by using: * For OpenAI ```shell spring boot add ai ``` -or - * For Azure OpenAI ```shell spring boot add ai-azure ``` There is a project catalog available for Azure OpenAI that covers more functionality. -Add the catalog by running the command - -* For Azure Open AI +Add the catalog by running the the following command: ```shell spring project-catalog add ai-azure ``` -Now you have the following projects that you can use to create a new project using the `spring boot new` command or add to your existing project using the `spring boot add` command. +Now you have the following projects that you can use to create a new project by using the `spring boot new` command or add to your existing project by using the `spring boot add` command. ```shell spring project list diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/glossary.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/glossary.adoc index f7661ad474e..6a990b62dc9 100644 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/glossary.adoc +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/glossary.adoc @@ -1,3 +1,3 @@ -[appendix] -[glossary] +[[appendix]] +[[glossary]] = Glossary diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/index.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/index.adoc index d650646ed6b..fea4b262f69 100644 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/index.adoc +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/index.adoc @@ -1,23 +1,22 @@ +[[introduction]] = Spring AI -== Introduction - The Spring AI project aims to streamline the development of applications that incorporate artificial intelligence functionality without unnecessary complexity. -The project draws inspiration from notable Python projects such as LangChain and LlamaIndex, but Spring AI is not a direct port of those projects. -The project was founded with the belief that the next wave of Generative AI applications will not just be for Python developers only, but will be ubiquitous across many programming languages. +The project draws inspiration from notable Python projects, such as LangChain and LlamaIndex, but Spring AI is not a direct port of those projects. +The project was founded with the belief that the next wave of Generative AI applications will not be only for Python developers but will be ubiquitous across many programming languages. At its core, Spring AI provides abstractions that serve as the foundation for developing AI applications. These abstractions have multiple implementations, enabling easy component swapping with minimal code changes. -For example, Spring AI introduces the AiClient interface with implementations for OpenAI and Azure OpenAI. +For example, Spring AI introduces the `AiClient` interface with implementations for OpenAI and Azure OpenAI. -In addition to these core abstractions, Spring AI aims to provide higher-level functionalities to address common use cases such as "Q&A over your documentation" or "Chat with your documentation." -As the complexity of the use cases increases, the Spring AI project will integrate with other projects in the Spring Ecosystem such as Spring Integration, Spring Batch, and Spring Data. +In addition to these core abstractions, Spring AI aims to provide higher-level functionalities to address common use cases such as "`Q&A over your documentation`" or "`Chat with your documentation.`" +As the complexity of the use cases increases, the Spring AI project will integrate with other projects in the Spring Ecosystem, such as Spring Integration, Spring Batch, and Spring Data. -To simplify setup, Spring Boot Starters are available to help set up essential dependencies and classes. +To simplify setup, Spring Boot starters are available to help set up essential dependencies and classes. There is also a collection of sample applications to help you explore the project's features. -Lastly, the new Spring CLI project also enables you to get started quickly using the command `spring boot new ai` for new projects or `spring boot add ai` for adding AI capabilities to your existing application. +Lastly, the new Spring CLI project also enables you to get started quickly by using the `spring boot new ai` command for new projects or `spring boot add ai` for adding AI capabilities to your existing application. -The next section provides a high-level overview of AI concepts and their representation in Spring AI. -The Getting Started section shows you how to create your first AI application +The <> provides a high-level overview of AI concepts and their representation in Spring AI. +The <> section shows you how to create your first AI application. Subsequent sections delve into each component and common use cases with a code-focused approach. diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/providers/huggingface/index.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/providers/huggingface/index.adoc index c7e19e3af5c..06860b9863a 100644 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/providers/huggingface/index.adoc +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/providers/huggingface/index.adoc @@ -1,15 +1,14 @@ -= HuggingFace +[[hugging-face]] += Hugging Face -== Introduction -One of the easiest ways you can get access to many Machine Learning and Artificial Intelligence models is by using the https://en.wikipedia.org/wiki/Hugging_Face[HuggingFace's] https://huggingface.co/inference-endpoints[Inference Endpoints]. +One of the easiest ways you can get access to many machine learning and artificial intelligence models is by using the https://en.wikipedia.org/wiki/Hugging_Face[Hugging Face's] https://huggingface.co/inference-endpoints[Inference Endpoints]. -Hugging Face Hub is a platform providing a collaborative environment for creating and sharing tens of thousands of Open Source ML/AI models, data sets, and demo applications. +Hugging Face Hub is a platform that provides a collaborative environment for creating and sharing tens of thousands of Open Source ML/AI models, data sets, and demo applications. -Inference Endpoints let you deploy AI Models on dedicated infrastructure with a pay as you go billing model. -You can use infrastructure provided by Amazon Web Services, Microsoft Azure and Google Cloud Platform. -Hugging Face lets you run the models on your own machine, but it is quite common to not have enough CPU/GPU resources to run the larger, more AI focused models. +Inference Endpoints let you deploy AI Models on dedicated infrastructure with a pay-as-you-go billing model. +You can use infrastructure provided by Amazon Web Services, Microsoft Azure, and Google Cloud Platform. +Hugging Face lets you run the models on your own machine, but it is quite common to not have enough CPU/GPU resources to run the larger, more AI-focused models. -It provides access to Meta's recent (August 2023) Llama 2 and CodeLlama 2 models as well as providing the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard[Open LLM Leaderboard] where you can quickly discover high quality models. +It provides access to Meta's recent (August 2023) Llama 2 and CodeLlama 2 models and provides the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard[Open LLM Leaderboard], where you can quickly discover high quality models. -While Hugging Face has a free hosting tier, which is very useful for quickly evaluating if a specific ML/AI Model fits your needs, they do not let you access many of those models on the free tier using the https://huggingface.co/docs/text-generation-inference/main/en/index[Text Generation Interface API], so since you want to end up on production anyway, with a stable API, pony up a few cents to try out a reliable solution. -Prices are as low as $0.06 per CPU core/hr and $0.6 per GPU/hr. +While Hugging Face has a free hosting tier, which is very useful for quickly evaluating if a specific ML/AI Model fits your needs, they do not let you access many of those models on the free tier by using the https://huggingface.co/docs/text-generation-inference/main/en/index[Text Generation Interface API]. If you want to end up on production anyway, with a stable API, pay a few cents to try out a reliable solution. Prices are as low as $0.06 per CPU core/hr and $0.6 per GPU/hr.