Video frame + transcript extraction #7

emcf · 2024-04-13T22:41:25Z

Looking to support extraction of mp4, mov, webm, avi files as well as youtube for a Vision-Language model (not a video model)

Video and audio is not standard in commercial multimodal models today. Because of this, I am looking to transcribe audio.

The text was updated successfully, but these errors were encountered:

emcf added enhancement New feature or request help wanted Extra attention is needed labels Apr 13, 2024

emcf changed the title ~~Video extraction~~ Video frame + transcript extraction Apr 18, 2024

emcf linked a pull request Apr 28, 2024 that will close this issue

Implemented video + audio extraction #12

Merged

emcf mentioned this issue Apr 28, 2024

Implemented video + audio extraction #12

Merged

emcf closed this as completed in #12 Apr 28, 2024

Provide feedback