CodeQA is a command-line tool that helps you semantically search and analyze your codebase. It uses advanced language models to understand code context and generate meaningful responses about your code.
- 🔍 Semantic code search using embeddings
- 📝 Interactive query mode
- 🌲 Project structure awareness
- 📋 LLM-ready prompt generation
- 🎯 Smart file selection for context
- 🚀 GPU/MPS acceleration support
The project requires Python 3.11+ and the following main dependencies:
chromadb>=0.4.22
torch>=2.0.0
transformers>=4.30.0
pathspec>=0.11.2
numpy
tqdm>=4.65.0
pyperclip
- Clone the repository:
git clone https://github.com/yourusername/codeqa.git
cd codeqa
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Basic usage:
python main.py --dir /path/to/your/codebase
--dir
: (Required) Path to the codebase directory to analyze--db-path
: Path to store embeddings database (default: "code_embeddings")--n
: Number of top results to return (default: 5)--debug
: Enable debug logging--show-content
: Show file contents in results--code-extensions
: List of file extensions to include (e.g., ".py .js .ts")--full-path
: Show full file paths in results--copy-prompt
: Enable prompt generation and copying to clipboard
After starting the tool, you can:
- Enter your queries about the codebase
- View matched files with relevance scores
- When using
--copy-prompt
:- Select specific files to include in the context
- Get token count for the generated prompt
- Have the prompt automatically copied to clipboard
Search for file handling code:
python main.py --dir ./myproject
> how does the file reading work
Generate an LLM-ready prompt:
python main.py --dir ./myproject --copy-prompt
> explain the authentication system
Analyze specific file types:
python main.py --dir ./myproject --code-extensions .py .ts .js
codeqa/
├── main.py # Main entry point
├── utils/
│ └── file_utils.py # File handling utilities
├── embeddings/
│ └── embedder.py # Code embedding generation
├── tokenization/
│ └── chunker.py # Code chunking logic
└── models/
└── code_chunk.py # Data models
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.