Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Feature: Signals to update vector index on page publish #30

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Morsey187
Copy link
Collaborator

@Morsey187 Morsey187 commented Dec 22, 2023

Adds a new env WAGTAIL_VECTOR_INDEX_UPDATE_ON_PUBLISH to enable registering all pages with a VectorIndexedMixin to wagtail's page_published signal.

  • Issue: Signals are part of the request cycle and updating indexes can be time consuming, we should add support for a task queue and consider whether we'd want to allow using these signals without one at all.
  • Issue: Currently requires rebuilding the whole index, instead of updating, we'd need to figure out:
    • Which indexes a model is in (so we can update the right indexes)
    • A way to remove documents from an index that match a given set of metadata (the object id and content type ID in this case)
    • An easier way to generate embeddings on a per-document level, instead of at the rebuild index stage

@Morsey187 Morsey187 changed the title Feature: Signals to update vector index on page publish Draft: Feature: Signals to update vector index on page publish Dec 22, 2023
@tomusher
Copy link
Member

tomusher commented Jan 8, 2024

Discussed this with Ben separately but copying some of those comments here for reference.

This implementation has raised a few potential challenging points we might need to address before we can finalise this. Right now, this does a full index rebuild on save, which as Ben identified can potentially be very slow.

Ideally, we would only update the current page when saving, but to do this we need to:

  • Identify what indexes the page is part of
  • Add an abstraction for upserting pages - while we can upsert documents right now, a re-indexed page may return a different set of documents, so we need to identify what documents belong to a page, delete them, and then reinsert the new documents.
  • Have an easier way to generate documents on a per-page level - at the moment it's in only doable when the whole index is rebuilt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants