Register at https://go.aiven.io/signup-opensearchclip to host your OpenSearch service and get $300 credits for 30 days for all Aiven services.
Press the button to open this repository in GitHub Codespaces:
Or, if you prefer to do it manually: at the top of this GitHub page, above the file browser, select the <> Code button, choose the Codespaces tab and choose Create codespace on main. This will start up a new Codespaces environment.
This will use your GitHub credentials to start a new Codespaces environment. The defaults should be acceptable. Once it's going, it will start a VSCode environment, showing this repository, and will automatically run
pip install -r requirements.txt
for you to install the Python packages we need.
To connect to our OpenSearch cluster we'll use the URI of your cluster.
- Grab the service URI from the service page of your Aiven for OpenSearch.
- Copu
.env.examples
and rename it to.env
. - Set
SERVICE_URI
in.env
to your cluster's URI.
To make sure that OpenSearch recognises vector data and supports KNN search, when creating an index we need to:
- set
knn
setting to true - specify name of the property to contain vector data, set its type to
knn_vector
and specify its dimension size.
Go to 1-prepare-opensearch.ipynb and run the notebook. Install/enable suggested extentions for python and Jupitor notebooks. Select recommended python environment.
You should see the response
{'acknowledged': True, 'shards_acknowledged': True, 'index': 'photos'}
Also a newly added index will appear in the list of indexes in the Indexes tab of the service Overview page of your Aiven for OpenSearch service.
In this step we'll load the CLIP model, compute feature vectors for a batch of images and send the data into OpenSearch.
Go to 2-process-and-upload.ipynb and run the notebook steps one by one. The last step will take several minutes to iterate over the photos.
We can use the OpenSearch Dashboard to see the contents of the index:
- In the service Overview for the Aiven for OpenSearch service, choose the OpenSearch Dashboards tab
- Copy the Password to your clipboard
- Open the OpenSearch Dashboards URI, login as user
avnadmin
with the copied password - Using the three-lines menu at the top left, choose Discover
- Choose "Create index pattern" and then Use default data source and "Next step"
- It should suggest
photos
as an available index - putphotos
into the index pattern name field, and "Next step" - Create index pattern
The next page should show the fields in the photos
index, including image_url
and embedding
.
From the three lines menu, choose Discover and it should show photos
selected. After a moment it should show the _source
entries for the index - all the index entries.
Time to search for an image by providing a text description. For this we'll do the following:
- Translate the text into a vector using CLIP model.
- Compare this single vector to the vectors for images that we stored in OpenSearch
- Retrieve 3 nearest images to the vector that is searched for.
Go to 3-run-vector-search.ipynb and run the notebook steps one by one.
Change the value of text_input
to search for different images.
- CLIP: Connecting text and images, the OpenAI blog post from 2021 that describes CLIP
- CLIP, the OpenAI GitHub repository for CLIP. This has code examples, and is the the basis for this workshop. At the end, it says:
See also
- OpenCLIP: includes larger and independently trained CLIP models up to ViT-G/14
- Hugging Face implementation of CLIP: for easier integration with the HF ecosystem
Other Aiven links:
- When text meets image: a guide to OpenSearch® for multimodal search is the ancestor of this workshop, but uses ~25,000 images Unsplash images, at their original resolution.
- An app for searching for images matching a text, using CLIP, PostgreSQL® and pgvector presents a Python web app using PostgreSQL® and pgvector to do essentially the same thing as this workshop.