🌌 SuperKnowBa

🎙️ Introduction

Build, manage, and chat with your knowledge base from multiple data sources like CSV, PDF, TXT, DOCX, and more, in just a click of a button.

This application leverages OpenAI's ChatGPT and Meta's FAISS to fetch relevant response to your questions. More options for different model choice and vectorstore choice is on the way!

Demo

You can test SuperKnowBa demo here: https://superknowba.streamlit.app/

Warning: Any data you upload in the demo will persist in the vectorDB! Do not upload any private information

🧐 How it works

Superknowba accepts a variety of file formats, currently CSV, PDF, TXT, and DOCX.

In the side pannel, you can create a database and upload the files. The texts from the files are then formatted, vectorized and chunked into smaller batches, where they are stored in a vector database.

Once you create a database, you can choose the database to upload another files in the future, or simply chat with the database. This database will persist in your local directory under superknowba/vectorstores, meaning that you'll be able to re-use the database everytime you start the application. It automatically scans for any new database you create.

When a user ask a question, superknowba applies similar preprocessing and compares it to the items in vectorstores, which picks the most relevant ones in semantic similarity rankings. This is re-formatted by ChatGPT and is returned as a response.

Note, if you're chatting with a database specifically, it will only be able to answer questions related to the data underneath. For example, it won't be able to answer a question "why is sky blue?" against a database about stock market data. This will be resolved in future iteration.

🦾 Installation

Clone the repository
Install dependencies
```
pip install -r requirements.txt
```
Now, you can simply run the following to get started!
```
streamlit app.py
```

🛠️ Upcoming features

PII Remover option to automatically remove sensitive data according to your needs
Support for different model, embedding, and vectorDB choice
Add fallback option for model used
Speed up CSV loader
Dedicated VectorDB management page, including CRUD operations for files in each DB
Parallelization in API calls, based on your choice to speed up the process
Add caching
Support for more file formats (url, HTML, youtube, etc.)
Support to connect with foreign DB

🤝 Contributing

Open to pull requests!

You will need to install pre-commit with the provided config with the following

pip install pre-commit
pre-commit install

make pull request

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
superknowba		superknowba
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌌 SuperKnowBa

🎙️ Introduction

Demo

🧐 How it works

🦾 Installation

🛠️ Upcoming features

🤝 Contributing

About

Releases

Packages

Contributors 2

Languages

richieyoum/superknowba

Folders and files

Latest commit

History

Repository files navigation

🌌 SuperKnowBa

🎙️ Introduction

Demo

🧐 How it works

🦾 Installation

🛠️ Upcoming features

🤝 Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages