Skip to content

A semantic search engine with hugging face and pinecone implementation for docusign documents.

License

Notifications You must be signed in to change notification settings

Sppdd/Semantic-Search-Engine

Repository files navigation

Semantic Search Engine for DocuSign Hackathon - Agreement Trap

A Streamlit-based application that allows users to import documents from DocuSign or other document management systems like your local files to build your own vector database and perform a semantic search across uploaded documents using the advanced hugging face semantic search model and Gemini Model.

Youtube Demo

Setup and Installation

  1. Clone the repository
  2. Create a virtual environment:
    python -m venv venv
    source venv/bin/activate  # or venv\Scripts\activate on Windows
  3. Install dependencies:
    pip install -r requirements.txt
  4. Set up environment variables in .env file
  5. Run the application:
    streamlit run app.py

Service Architecture

┌─────────────────┐     ┌──────────────┐     ┌────────────────┐
│  Streamlit UI   │────▶│  DocuSign    │───▶│  Document      │
└─────────────────┘     │  /local files│     │  Processing    │
         │              └──────────────┘     └────────────────┘
         │                                           │
         ▼                                          ▼
┌─────────────────┐     ┌──────────────┐     ┌────────────────┐
│  Vector Store   │◀────│  Embedding   │◀────│  Text          │
│  (Pinecone)     │     │  Service     │     │  Extraction    │
└─────────────────┘     └──────────────┘     └────────────────┘

Technology Stack

Machine Learning Models
  • Sentence Transformers: Using all-MiniLM-L6-v2 for generating document embeddings
Vector Database
  • Pinecone: Vector similarity search and storage
AI Model
  • Gemini: AI model for generating nice responses to user queries
Document Processing
  • PyMuPDF (fitz)
  • PyPDF2
  • python-docx

Integration

I Wrote all the code from scratch and it's only for this hackathon with the help of AI tools.

For more information, Drop me a message on LinkedIn

#Docusign #huggingFace #gemini #semantic_search #streamlit

About

A semantic search engine with hugging face and pinecone implementation for docusign documents.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages