OpenVideo specializes in the domain of text-to-video generation, with the goal of providing high-quality and diverse video datasets to AI researchers globally. In addition, it offers comprehensive tools for data collection, cleaning, and annotation, thereby contributing to the advancement of the artificial intelligence industry.
Source | Resolution | Hours | Clips |
---|---|---|---|
Pexels-Raw | 720p | 672h | 106k+ clips |
From ModelScope:
bash git clone https://user_id:[email protected]/datasets/OpenVideo/pexel-0808-complete-final-test.git
From Huggingface:
bash git clone https://user_id:[email protected]/datasets/OpenVideo/pexel-0808-complete-final-test
(user_id is your username, and access_token needs to be generated in the settings)
python ./openvideo/video/preprocess/utils/decode_parquet_file.py --parquet_dir your_parquet_path --save_dir your_save_path
You can install this package via PyPI by entering the following command in the terminal:
$ pip install openvideo
Alternatively, you can obtain the latest version from GitHub:
$ pip install -U https://github.com/UmiMarch/OpenVideo/archive/master.zip # with --user for user install (no root)
The dependencies for OpenVideo
are as follows:
huggingface_hub>=0.22.2
tqdm>=4.66.1
wget>=3.2
requests>=2.31.0
aiohttp>=3.9.3
async_timeout>=4.0.3
moviepy>=1.0.3
opencv-python>=4.9.0.80
selenium>=4.19.0
scenedetect>=0.6.3
texttable>=1.7.0
bs4>=0.0.2
from openvideo.video.fetch import MixkitVideoFetch
mixkit_fetch = MixkitVideoFetch(root_dir="your/video/save/path")
mixkit_fetch.download_with_category_page_idx(
category="sky", # Video category
page_idx=1, # Start downloading from this page
start_idx=22, # Start downloading from this video
platform="linux" # Running platform
)
- Pixabay https://pixabay.com/zh
from openvideo.video.fetch import PixabayVideoFetch
pixabay = PixabayVideoFetch("your/video/save/path")
pixabay.download(
chrome_exe_path=r"your/chrome/exe/path",
username="your/pixabay/username",
password="your/pixabay/password",
headless=False,
platform="windows" # Currently only supports Windows
)
- Pexels https://www.pexels.com/
from openvideo.video.fetch import PexelsVdieoFetch, PexelsAPI
# Step 1: Call the API to obtain video links
pexels_api = PexelsAPI(
api="your/pexels/api",
save_path="pexels_api.npy"
)
pexels_api.fetch_api(
start_page=1, # Starting page
end_page=2, # Ending page
save_api_dict_every_pages=1 # Save the API dictionary every n pages
)
# Step 2: Download videos
pexels = PexelsVdieoFetch("pexels")
pexels.download(
api_npy_save_path="pexels_api.npy",
chrome_exe_path=r"your/chrome/exe/path",
headless=False
)
We have developed a video annotation platform based on the Rust programming language, designed to efficiently generate labels for various media types, including images and videos. This platform supports the invocation of state-of-the-art AI models such as GPT-4o, Gemini, and Claude3, and offers flexible configuration options. It is designed for high performance, capable of processing 100 queries per second, with task processing capacity scalable to 200 million queries. Utilizing 100 API accounts, this tool can synthesize a dataset containing 200,000 videos within 8 hours. All outputs are categorized and organized by model and prompt, ensuring a clear structure for subsequent research and application integration.
(If display issues occur, please try using other browsers. e.g. Edge.)
We provide an annotation validation platform where users can view, validate, and modify annotations for already annotated video datasets.
Usage:
-
Open the HuggingFace link and enter your personal token.
-
Play videos, view the corresponding annotation texts, modify them, than switch to the next video.
For custom datasets:
-
The dataset and code must be on the same platform (e.g., the dataset is hosted on HuggingFace).
-
Modify the dataset path in run.py.
We provide a general-purpose data migration platform for transferring datasets from HuggingFace to ModelScope, facilitating access and usage of datasets across different regional networks.
Usage:
Enter your HuggingFace access token & dataset path, ModelScope access token & repository directory, then click Submit
to run the dataset migration from HuggingFace to the corresponding ModelScope repository automatically.
Crawling Algorithms:@yangming @heatingma @ZZY @晚来风雪
Video Download: @yangming @晚来风雪 @杰杰杰
Data Cleaning:@一马平川 @zjukop @伊小布
Prompt: @Tiger.C @dpyneo @巧克力
Labeling: @YUE @zjukop
Validation Platform: @YUE @晚来风雪
Data Migration: @晚来风雪 @heatingma
Manual Validation: @一马平川 @dpyneo @杨嘉昊 @flipped @yi @believe @思恩
Project Research: @dingby @believe
Aesthetic Guidance: @图拉 @杨嘉昊
Documentation: @ZZY @枪枪
Project Coordination: @巧克力
Server/Financial Support: Li Bai AI Lab
Storage: HuggingFace, ModelScope, OPENDataLab
Techno-sharing:@shoulder @王铁震 @杨欢 @新年京
Discussion:@前仰跳投 @浮羽 @MYX @Winniy @GUI @Planet
This project is licensed under the CC-BY-4.0 open-source license.