Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Enhance Docling Query Engine: Add PGVector, MongoDB, and Qdrant Support via VectorDBFactory Wrapper #950

Open
sitloboi2012 opened this issue Feb 13, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@sitloboi2012
Copy link

Is your feature request related to a problem? Please describe.

The current query engine implementation (see docling_query_engine.py) leverages ChromaDB by wrapping its collection into a LlamaIndex ChromaVectorStore for indexing. Meanwhile, the VectorDBFactory class provides a mechanism to create vector database storage with various backends. To improve flexibility and meet our RAG objectives outlined in [Feature Request]: Docling data ingestion to RAG (#688), we need to extend this functionality.

Describe the solution you'd like

  1. Review Existing Implementation:
  • Examine the current ChromaDB-based query engine implementation.
  • Understand how the VectorDBFactory maps the user-selected vector DB to a corresponding LlamaIndex VectorStore.
  1. Implement Additional Support:
  • Develop wrappers or integration logic for alternative vector databases, specifically PGVector, MongoDB, and Qdrant.
  • Ensure that these new wrappers map configuration options correctly to the LlamaIndex-supported VectorStore interfaces.
  1. Integration & Testing:
  • Integrate the new wrappers with the existing query engine interface.
  • Test functionality within the context of the DocumentAgent (Phase 1 DocumentAgent (Phase 1) #438) and ensure compatibility with RAG capabilities.
  • Update documentation and examples to reflect the extended support.

Additional context

This enhancement is part of our ongoing effort to make the agent more versatile and not limited to a single vector DB. It builds on recent work (e.g., the merged ChromaDB implementation) and aligns with upcoming changes in retrieve_user_proxy_agent.py to support multiple query engines.

@sitloboi2012 sitloboi2012 added the enhancement New feature or request label Feb 13, 2025
@sitloboi2012
Copy link
Author

@AgentGenie @Eric-Shang please help me review this issue and assign it for me 😃 This will be the separate sub-issue build on top and extend the current previous work from in #688 #941

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Waiting for merge
Development

No branches or pull requests

1 participant