GitHub - UmiMarch/OpenVideo: OpenVideo specializes in the domain of text-to-video generation, with the goal of providing high-quality and diverse video datasets to AI researchers globally.

OpenVideo specializes in the domain of text-to-video generation, with the goal of providing high-quality and diverse video datasets to AI researchers globally. In addition, it offers comprehensive tools for data collection, cleaning, and annotation, thereby contributing to the advancement of the artificial intelligence industry.

中文主页

📚Dataset

Source	Resolution	Hours	Clips
Pexels-Raw	720p	672h	106k+ clips

Download：

From ModelScope：

bash git clone https://user_id:[email protected]/datasets/OpenVideo/pexel-0808-complete-final-test.git

From Huggingface：

bash git clone https://user_id:[email protected]/datasets/OpenVideo/pexel-0808-complete-final-test

(user_id is your username, and access_token needs to be generated in the settings)

Script:

python ./openvideo/video/preprocess/utils/decode_parquet_file.py --parquet_dir your_parquet_path --save_dir your_save_path

⚡Tools

You can install this package via PyPI by entering the following command in the terminal:

$ pip install openvideo

Alternatively, you can obtain the latest version from GitHub:

$ pip install -U https://github.com/UmiMarch/OpenVideo/archive/master.zip # with --user for user install (no root)

The dependencies for OpenVideo are as follows:

huggingface_hub>=0.22.2
tqdm>=4.66.1
wget>=3.2
requests>=2.31.0
aiohttp>=3.9.3
async_timeout>=4.0.3
moviepy>=1.0.3
opencv-python>=4.9.0.80
selenium>=4.19.0
scenedetect>=0.6.3
texttable>=1.7.0
bs4>=0.0.2

Video Download

Mixkit https://mixkit.co/free-stock-video/

from openvideo.video.fetch import MixkitVideoFetch
 
mixkit_fetch = MixkitVideoFetch(root_dir="your/video/save/path")
mixkit_fetch.download_with_category_page_idx(
    category="sky", # Video category
    page_idx=1, # Start downloading from this page
    start_idx=22, # Start downloading from this video
    platform="linux" # Running platform
)

Pixabay https://pixabay.com/zh

from openvideo.video.fetch import PixabayVideoFetch

pixabay = PixabayVideoFetch("your/video/save/path")
pixabay.download(
    chrome_exe_path=r"your/chrome/exe/path",
    username="your/pixabay/username",
    password="your/pixabay/password",
    headless=False,
    platform="windows" # Currently only supports Windows
)

Pexels https://www.pexels.com/

from openvideo.video.fetch import PexelsVdieoFetch, PexelsAPI

# Step 1: Call the API to obtain video links
pexels_api = PexelsAPI(
    api="your/pexels/api", 
    save_path="pexels_api.npy"
)
pexels_api.fetch_api(
    start_page=1, # Starting page
    end_page=2, # Ending page
    save_api_dict_every_pages=1 # Save the API dictionary every n pages
)

# Step 2: Download videos
pexels = PexelsVdieoFetch("pexels")
pexels.download(
    api_npy_save_path="pexels_api.npy", 
    chrome_exe_path=r"your/chrome/exe/path",
    headless=False
)

Video Annotation Platform

We have developed a video annotation platform based on the Rust programming language, designed to efficiently generate labels for various media types, including images and videos. This platform supports the invocation of state-of-the-art AI models such as GPT-4o, Gemini, and Claude3, and offers flexible configuration options. It is designed for high performance, capable of processing 100 queries per second, with task processing capacity scalable to 200 million queries. Utilizing 100 API accounts, this tool can synthesize a dataset containing 200,000 videos within 8 hours. All outputs are categorized and organized by model and prompt, ensuring a clear structure for subsequent research and application integration.

(If display issues occur, please try using other browsers. e.g. Edge.)

Annotation Validation Platform

We provide an annotation validation platform where users can view, validate, and modify annotations for already annotated video datasets.

Usage:

Open the HuggingFace link and enter your personal token.
Play videos, view the corresponding annotation texts, modify them, than switch to the next video.

For custom datasets:

The dataset and code must be on the same platform (e.g., the dataset is hosted on HuggingFace).
Modify the dataset path in run.py.

Data Migration

We provide a general-purpose data migration platform for transferring datasets from HuggingFace to ModelScope, facilitating access and usage of datasets across different regional networks.

Usage：

Enter your HuggingFace access token & dataset path, ModelScope access token & repository directory, then click Submit to run the dataset migration from HuggingFace to the corresponding ModelScope repository automatically.

👨‍💻 Contributors

Crawling Algorithms：@yangming @heatingma @ZZY @晚来风雪

Video Download: @yangming @晚来风雪 @杰杰杰

Data Cleaning：@一马平川 @zjukop @伊小布

Prompt: @Tiger.C @dpyneo @巧克力

Labeling: @YUE @zjukop

Validation Platform: @YUE @晚来风雪

Data Migration: @晚来风雪 @heatingma

Manual Validation: @一马平川 @dpyneo @杨嘉昊 @flipped @yi @believe @思恩

Project Research: @dingby @believe

Aesthetic Guidance: @图拉 @杨嘉昊

Documentation: @ZZY @枪枪

Project Coordination: @巧克力

🙏 Acknowledgments

Server/Financial Support: Li Bai AI Lab

Storage: HuggingFace, ModelScope, OPENDataLab

Techno-sharing：@shoulder @王铁震 @杨欢 @新年京

Discussion：@前仰跳投 @浮羽 @MYX @Winniy @GUI @Planet

✨ Connection

©️ License

This project is licensed under the CC-BY-4.0 open-source license.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
docs/assets		docs/assets
openvideo		openvideo
tests		tests
.gitignore		.gitignore
README.md		README.md
README_ZH.md		README_ZH.md
codecov.yml		codecov.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📚Dataset

Download：

Script:

⚡Tools

Video Download

Video Annotation Platform

Annotation Validation Platform

Data Migration

👨‍💻 Contributors

🙏 Acknowledgments

✨ Connection

©️ License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

UmiMarch/OpenVideo

Folders and files

Latest commit

History

Repository files navigation

📚Dataset

Download：

Script:

⚡Tools

Video Download

Video Annotation Platform

Annotation Validation Platform

Data Migration

👨‍💻 Contributors

🙏 Acknowledgments

✨ Connection

©️ License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages