Leveraging the Power of Large Language Models and the Langchain Framework for an Innovative Approach to Document Querying
This project aims to implement a document-based question-answering system using the power of OpenAI's GPT-3.5 Turbo model, Python, and the Langchain Framework. It processes PDF documents, breaking them into ingestible chunks, and then stores these chunks into a Chroma DB vector database for querying. It complements a Medium article called Howto Build a Document-Based Q&A System Using OpenAI and Python.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
To install the project, you need to have Python installed on your machine.
The project uses Poetry for managing dependencies. After cloning the repository, navigate to the project directory and install dependencies with the following commands:
poetry install
poetry shell
Before you can run ingesting or querying you have to make sure that a .env file exists. This file should have a single line that read OPENAI_API_KEY=yourkey
To ingest documents, place your PDF files in the 'docs' folder make sure that you are in the app folder and run the following command:
cd app
python ingest.py
To query the ingested documents, make sure that you are in the app folder, run the following command and follow the interactive prompts:
cd app
python query.py
To visualize and interact with the system via the Streamlit app, run the following command:
streamlit run streamlit_app.py
This project is licensed under the MIT license - see the LICENSE.md file for details