This is a vibe coding experiment. Its aim is to setup a proof of concept. There is no guarantee coming with this piece of software
A Python application for analyzing your Zotero library using topic modeling and RAG (Retrieval-Augmented Generation) technology.
- Topic Modeling: Analyze your PDF documents to discover latent topics
- Interactive Visualizations: Explore topic distributions and trends
- Speak with your PDFs: Ask questions about your documents using RAG technology
- Multiple Languages: Support for English, French, German, and more
- Secure Credentials: Local secure storage of your API keys
- Python 3.10 or higher
- pip (Python package installer)
- A Zotero account with API access enabled
- (Optional) An Anthropic API key for enhanced PDF chat functionality
- (Optional) Ollama for local language model support
# Install miniconda if you don't have it already
# Create a new environment with Python 3.10
conda create -n zotero-topic python=3.10
conda activate zotero-topic
# Or using venv
python -m venv zotero-topic
# On Windows
zotero-topic\Scripts\activate
# On macOS/Linux
source zotero-topic/bin/activate
git clone [repository-url]
cd zotero_topic_modeling
# Install dependencies
pip install -r requirements.txt
# Run Python and execute:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
If you want to use local language models instead of the Anthropic API:
- Download and install Ollama from ollama.com
- Pull a model (we recommend starting with a smaller model):
ollama pull llama3.2:3b
- Make sure Ollama is running when you use the application
# Make sure your virtual environment is activated
python -m zotero_topic_modeling.main
- Go to zotero.org/settings/keys
- Click "Create new private key"
- Give it a name (e.g., "Topic Modeling App")
- Make sure to enable:
- Allow library access (Required)
- Allow notes access
- Allow file access
- Click "Save Key"
- Note down both your library ID (found in Feeds/API) and the new API key
- Enter these credentials when prompted by the application
- Connect to your Zotero account using your credentials
- Select a collection from the tree view
- Choose analysis settings (language, number of topics)
- Click "Process Selected Collection" to analyze the documents
- Explore the topic modeling results when processing completes
- Use "Speak with your PDFs" to ask questions about your documents
This feature allows you to have conversations with your documents using:
- Anthropic Claude API: Higher quality responses (requires API key)
- Local Ollama models: More privacy-focused option (requires Ollama installation)
For detailed instructions, see SPEAK_WITH_PDFS.md
- Verify your Zotero Library ID and API key
- Ensure your API key has the correct permissions
- Check your internet connection
- Make sure PDFs are accessible in your Zotero library
- Check that PDFs are text-based (not scanned images)
- Try with a smaller collection first
- Make sure Ollama is running when using local models
- Try using a smaller model if you encounter memory issues
- Check that you have the model installed with
ollama list
- Make sure you're using Python 3.10 or higher
- Check that all dependencies are installed correctly
- On Windows, you might need to install Visual C++ Build Tools
- Operating System: Windows 10+, macOS 10.14+, or Linux
- RAM: 8GB minimum, 16GB recommended (especially for Ollama)
- Disk Space: At least 8GB free space (more if using Ollama models)
- Processor: Multi-core processor recommended
- Internet: Required for Zotero API access and Anthropic API
- All PDF processing happens locally on your machine
- When using Anthropic Claude API, document content is sent to Anthropic's servers
- When using Ollama, all processing stays on your local machine
- Credentials are stored securely using your system's keyring/keychain
For issues or questions, please open an issue in the GitHub repository.