We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking to support extraction of mp4, mov, webm, avi files as well as youtube for a Vision-Language model (not a video model)
mp4
mov
webm
avi
youtube
Video and audio is not standard in commercial multimodal models today. Because of this, I am looking to transcribe audio.
The text was updated successfully, but these errors were encountered:
Successfully merging a pull request may close this issue.
Looking to support extraction of
mp4
,mov
,webm
,avi
files as well asyoutube
for a Vision-Language model (not a video model)Video and audio is not standard in commercial multimodal models today. Because of this, I am looking to transcribe audio.
The text was updated successfully, but these errors were encountered: