Semantic Search Engine for DocuSign Hackathon - Agreement Trap
A Streamlit-based application that allows users to import documents from DocuSign or other document management systems like your local files to build your own vector database and perform a semantic search across uploaded documents using the advanced hugging face semantic search model and Gemini Model.
- Clone the repository
- Create a virtual environment:
python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables in
.env
file - Run the application:
streamlit run app.py
┌─────────────────┐ ┌──────────────┐ ┌────────────────┐
│ Streamlit UI │────▶│ DocuSign │───▶│ Document │
└─────────────────┘ │ /local files│ │ Processing │
│ └──────────────┘ └────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌──────────────┐ ┌────────────────┐
│ Vector Store │◀────│ Embedding │◀────│ Text │
│ (Pinecone) │ │ Service │ │ Extraction │
└─────────────────┘ └──────────────┘ └────────────────┘
- Sentence Transformers: Using all-MiniLM-L6-v2 for generating document embeddings
- Pinecone: Vector similarity search and storage
- Gemini: AI model for generating nice responses to user queries
- PyMuPDF (fitz)
- PyPDF2
- python-docx
- DocuSign eSignature API
- Streamlit for the user interface
- HuggingFace Inference API
I Wrote all the code from scratch and it's only for this hackathon with the help of AI tools.
For more information, Drop me a message on LinkedIn
#Docusign #huggingFace #gemini #semantic_search #streamlit