This document provides detailed instructions for installing the QA Dataset Clustering Toolkit (qadst).
Before installing qadst, ensure you have the following prerequisites:
- Python 3.10 or higher
- Poetry for dependency management
- OpenAI API key (for embedding models and LLM-based features)
git clone https://github.com/sergeyklay/qa-dataset-clustering.git
cd qa-dataset-clustering
Using Poetry:
poetry install
This will create a virtual environment and install all required dependencies.
Create a .env
file from the template:
cp .env.example .env
Edit the .env
file to include your API keys and other configuration:
OPENAI_API_KEY=your_openai_api_key
OPENAI_MODEL=gpt-4o
OPENAI_EMBEDDING_MODEL=text-embedding-3-large
To verify that qadst is installed correctly, run:
poetry run qadst --help
You should see the help message with available commands and options.
If you're planning to contribute to qadst, install with development dependencies:
poetry install --with dev
This will install additional tools for testing, linting, and documentation.
- Poetry not found: Ensure Poetry is installed and in your PATH
- Python version mismatch: Verify you have Python 3.10+ installed
- Dependency conflicts: Try
poetry update
to resolve dependency issues
If you encounter any other installation issues, please check the GitHub repository for known issues or open a new issue with details about your environment and the error messages you're seeing.