Skip to content

Commit

Permalink
Add Pinecone VectorStore
Browse files Browse the repository at this point in the history
 - Based on the official pinecone java library.
   Later expects that indices are created externally via Ops.
 - Map Document metadata to and from Pinecone's internal Struct.
   Later converts the metadata into pinecone json format.
 - Add integration tests and README.
  • Loading branch information
tzolov authored and markpollack committed Nov 2, 2023
1 parent 001ee99 commit 81a5eaf
Show file tree
Hide file tree
Showing 5 changed files with 810 additions and 0 deletions.
4 changes: 4 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@
<module>document-readers/pdf-reader</module>
<module>document-readers/tika-reader</module>
<module>embedding-clients/transformers-embedding</module>
<module>vector-stores/spring-ai-pinecone</module>

</modules>

<organization>
Expand Down Expand Up @@ -82,6 +84,8 @@
<pgvector.version>0.1.3</pgvector.version>
<postgresql.version>42.6.0</postgresql.version>
<milvus.version>2.3.0</milvus.version>
<pinecone.version>0.6.0</pinecone.version>
<protobuf-java-util.version>3.24.4</protobuf-java-util.version>

<!-- testing dependecies -->
<testcontainers.version>1.19.0</testcontainers.version>
Expand Down
113 changes: 113 additions & 0 deletions vector-stores/spring-ai-pinecone/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Pinecone VectorStore

This readme will walk you through setting up the Pinecone VectorStore to store document embeddings and perform similarity searches.

## What is Pinecone?

[Pinecone](https://www.pinecone.io/) is a popular cloud-based vector database, which allows you to store and search vectors efficiently.

## Prerequisites

1. Pinecone Account: Before you start, ensure you sign up for a [Pinecone account](https://app.pinecone.io/).
2. Pinecone Project: Once registered, create a new project, an index, and generate an API key. You'll need these details for configuration.
3. OpenAI Account: Create an account at [OpenAI Signup](https://platform.openai.com/signup) and generate the token at [API Keys](https://platform.openai.com/account/api-keys)

## Configuraiton

To set up PineconeVectorStore, gather the following details from your Pinecone account:

* Pinecond API Key
* Pinecone Environment
* Pinecone Project ID
* Pinecone Index Name
* Pinecone Namespace

> **Note**
> This information is available to you in the Pinecone UI portal.

When setting up embeddings, select a vector dimension of 1526. This matches the dimensionality of OpenAI's model "text-embedding-ada-002", which we'll be using for this guide.

Additionally, you'll need to provide your OpenAI API Key. Set it as an environment variable like so:

```bash
export SPRING_AI_OPENAI_API_KEY='Your_OpenAI_API_Key'
```

## Dependencies

Add these dependencies to your project:

1. OpenAI: Required for calculating embeddings.

```xml
<dependency>
<groupId>org.springframework.experimental.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
<version>0.7.0-SNAPSHOT</version>
</dependency>
```

2. Pinecone

```xml
<dependency>
<groupId>org.springframework.experimental.ai</groupId>
<artifactId>spring-ai-pinecone</artifactId>
<version>0.7.0-SNAPSHOT</version>
</dependency>
```

## Sample Code

To configure Pinecone in your application, you can use the following setup:

```java
@Bean
public PineconeVectorStoreConfig pineconeVectorStoreConfig() {

return PineconeVectorStoreConfig.builder()
.withApiKey(System.getenv( <PINECONE_API_KEY> ))
.withEnvironment(gcp-starter)
.withProjectId(89309e6)
.withIndexName(spring-ai-test-index)
.withNamespace("") // Leave it empty as for free tier as later doesn't support namespaces.
.build();
}
```

Integrate with OpenAI's embeddings by adding the Spring Boot OpenAI starter to your project.
This provides you with an implementation of the Embeddings client:

```java
@Bean
public VectorStore vectorStore(PineconeVectorStoreConfig config, EmbeddingClient embeddingClient) {
return new PineconeVectorStore(config, embeddingClient);
}
```

In your main code, create some documents

```java
List<Document> documents = List.of(
new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!",
Collections.singletonMap("meta1", "meta1")),
new Document("Hello World Hello World Hello World Hello World Hello World Hello World Hello World"),
new Document(
"Great Depression Great Depression Great Depression Great Depression Great Depression Great Depression",
Collections.singletonMap("meta2", "meta2")));
```

Add the documents to your vector store:

```java
vectorStore.add(List.of(document));
```

And finally, retrieve documents similar to a query:

```java
List<Document> results = vectorStore.similaritySearch("Spring", 5);
```

If all goes well, you should retrieve the document containing the text "Spring AI rocks!!".
70 changes: 70 additions & 0 deletions vector-stores/spring-ai-pinecone/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.experimental.ai</groupId>
<artifactId>spring-ai</artifactId>
<version>0.7.0-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>
<artifactId>spring-ai-pinecone</artifactId>
<packaging>jar</packaging>
<name>spring-ai-pinecone</name>
<description>spring-ai-pinecone</description>
<url>https://github.com/spring-projects-experimental/spring-ai</url>

<scm>
<url>https://github.com/spring-projects-experimental/spring-ai</url>
<connection>git://github.com/spring-projects-experimental/spring-ai.git</connection>
<developerConnection>[email protected]:spring-projects-experimental/spring-ai.git</developerConnection>
</scm>

<properties>
<maven.compiler.target>17</maven.compiler.target>
<maven.compiler.source>17</maven.compiler.source>
</properties>

<dependencies>
<dependency>
<groupId>org.springframework.experimental.ai</groupId>
<artifactId>spring-ai-core</artifactId>
<version>${project.parent.version}</version>
</dependency>

<dependency>
<groupId>io.pinecone</groupId>
<artifactId>pinecone-client</artifactId>
<version>${pinecone.version}</version>
</dependency>

<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java-util</artifactId>
<version>${protobuf-java-util.version}</version>
</dependency>

<!-- TESTING -->
<dependency>
<groupId>org.springframework.experimental.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
<version>${parent.version}</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.awaitility</groupId>
<artifactId>awaitility</artifactId>
<version>3.0.0</version>
<scope>test</scope>
</dependency>

</dependencies>

</project>
Loading

0 comments on commit 81a5eaf

Please sign in to comment.