Name	Name	Last commit message	Last commit date
Latest commit History 5 Commits
.mvn/wrapper	.mvn/wrapper
frontend	frontend
src	src
.gitignore	.gitignore
LICENSE	LICENSE
NOTICE	NOTICE
README.md	README.md
credentials.txt	credentials.txt
mvnw	mvnw
mvnw.cmd	mvnw.cmd
pom.xml	pom.xml

Build your own Java RAG AI Agent

⬅ This is the next workshop step after the step-1.

Code, moar code, MOAR CODE

🤩 The step builds the beginning of a functioning AI Agent by introducing

DPR – Dense Passage Retrieval
RAG – Retrival Augmented Generation
VSS – Vector Similarity Search
Parsing unstructured text
Chunking strategies

♻️ And introduces the following technologies and techniques

Spring AI's Vector Stores
Apache Cassandra's Secondary Indexes and Vector data type
Apache Tika to parse unstructured documents into text

This step introduces a new Decorating AI Agent AiAgentVector that adds the RAG capabilities, with the use of Spring AI's VectorStore interface.

The prompt template system-prompt-qa.txt adds some text to tell the LLM about the results from the Vector Similarity Search (VSS).

📑 To upload documents (test or PDF files) go to the http:localhost:8080/upload url. Any unstructured text in files, e.g. PDFs, will be parsed to plain text by Apache Tika. Text is chunked into 300 words with 150 word overlaps.

The implementation of VectorStore used is CassandraVectorStore. This automatically creates a default schema for itself. This can be configured to use a different and/or existing table, if you so desire. In real use-cases it will be expected to have multiple vector stores in different domains and on different data, hence its flexibility.

The default schema looks like…

CREATE TABLE datastax_ai_agent.vector_store (
    id text PRIMARY KEY,
    content text,
    embedding vector<float, 1536>
);

CREATE CUSTOM INDEX vector_store_embedding_idx
   ON datastax_ai_agent.vector_store (embedding)
   USING 'StorageAttachedIndex';

ℹ️ The vector dimensions is automatic to the dimensions used by the embedding model you have configured in Spring AI.

🧐 You might have noticed the AiUploadUI class is in the .vector. package and wondered why UI and Agents are in the same package. This codebase is packaging-by-feature instead of packaging-by-layer. This is an valuable approach that anyone that has worked refactoring large legacy codebases in the past may be familiar with. More info here.

🔎 To see changes this step introduces use git diff workshop-step-1..workshop-step-2.

Configure and Build

🏃🏿 Run the project like:

./mvnw clean spring-boot:run

👩‍💻 Open in a browser http://localhost:8080 and ask your chatbot some questions that requires specific information you know it doesn't have.

👩‍💻 Open in a browser http://localhost:8080/upload and upload a text or PDF file that contains that specific information.

Ask the questions testing for answers that deliver the new information.

🔍 Explore the data that's been created in AstraDB.

Open the AstraDB console, go to the CQL Console
Type the command cql commands

USE datastax_ai_agent ;
DESCRIBE table vector_store ;
SELECT id FROM vector_store ;
SELECT id,content FROM vector_store ;
SELECT id,content,embedding FROM vector_store ;