An intelligent video retrieval system leveraging Large Language Models (LLMs) and multimodal search, developed for the AIC2024 competition and accepted at the international SOICT 2024 conference.
Table of Contents
The LLM-Powered Video Search System
is an advanced multimodal video search solution that leverages Large Language Models (LLMs) to enhance video retrieval through text, image, and metadata queries. This project was developed for the AIC2024 competition and has been accepted at the international SOICT 2024 conference, aiming to provide an intelligent and efficient video search system. Details about the paper can be found on Springer.
-
Multimodal Search Capabilities
- Text-based search: Supports ASR (Automatic Speech Recognition), OCR, captions, and descriptive image queries for improved accuracy.
- Image-based search: Enables users to find specific video segments based on images.
- Metadata-based search: Provides a 7x7 matrix for tagging objects and color attributes for contextual search.
-
LLM-Powered Interaction
- Integrates LLMs (e.g., GPT-4) to handle natural language queries and deliver relevant search results tailored to the context.
-
User-Friendly Interface
- A responsive user interface allows users to view results as keyframes or full video segments and interact with detailed metadata.
- Back-end: Django
- Core Technologies: CLIP, Faiss, TFIDF
- Supporting Technologies: OpenCV, PyTorch, Transformers
- Development Tools: Docker, Git, Jupyter Notebook
-
Clone Repository
git clone https://github.com/HTAnh2003/LLM_Powered_Video_Search.git cd LLM_Powered_Video_Search
-
Install Dependencies Ensure Python and Django are installed. Then, install other dependencies from
requirements.txt
:pip install -r requirements.txt
-
Configure
MEDIA_ROOT
Open settings.py in theAIC/
folder and setMEDIA_ROOT
to point to your localmedia
directory:MEDIA_ROOT = '/path/to/your/media'
You can download the dataset from Google Drive or Kaggle.
-
Verify Paths in
viewAPI.py
Ensure paths in app/viewAPI.py are correct. -
Run Migrations Update the database with migrations:
python manage.py migrate
-
Run the Application To start the application, use:
python manage.py runserver
The app will run by default at
http://127.0.0.1:8000/
.
- Data Processing: Video data is processed using ASR or extracted via TransnetV2, then converted into image features and metadata.
- LLM Powered Interaction: Natural language queries are processed by the LLM and combined with image features and metadata for relevant video retrieval.
├── LLM_Powered_Video_Search/
│ ├── AIC/
│ │ ├── settings.py
│ ├── app/
│ │ ├── admin.py
│ │ ├── data_utils.py
│ │ ├── migrations/
│ │ ├── static/
│ │ ├── templates/
│ │ ├── viewAPI.py
│ ├── data_extraction/
│ │ ├── TransnetV2/
│ │ ├── audio/
│ │ ├── metadata/
│ ├── docker-compose.yml
│ ├── figs/
│ ├── manage.py
│ ├── requirements.txt
│ ├── utils/
│ ├── LLM/
│ ├── video_retrieval/
│ ├── faiss_search.py
│ ├── combine_search.py
| |...