Whisper-YT-Transcriber is a Python-based CLI tool that uses the Whisper ASR model from OpenAI to transcribe YouTube videos. This tool can be used to transcribe an individual YouTube video or a complete YouTube channel. The tool integrates the video metadata with the generated transcriptions into .txt files, which are organized by the video metadata.
The performance of the Whisper Model is quite remarkable and because it is open-source - there are no licensing fees. The models run locally.
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below is a comparison chart from Whisper's Docmentation. You can see rhe smaller tier models (tiny and base) use a fraction of the memory and are 8 to 32 times faster compared to the largest two tiers.
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
base | 74 M | base.en |
base |
~1 GB | ~16x |
small | 244 M | small.en |
small |
~2 GB | ~6x |
medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
large | 1550 M | N/A | large.V1,V2 |
~10 GB | 1x |
- Python 3.7+
- Python packages:
- yt_dlp
- whisper
- pandas
- pytest (for testing)
The Server memory and CPU requirements depend on the Whisper model used and the size of the video files being transcribed. We recommend testing the system with smaller models (tiny.en, small.en, base.en) and shorter video files ( < 15 minutes) to guage system demand before trying larger models and longer videos.
Clone this repository to your local machine, navigate into the project directory, and install the required Python dependencies.
# clone project
git clone https://github.com/yourusername/youtube-transcriber.git
# cd to working directory
cd youtube-transcriber
# install requirements
pip install -r requirements.txt
# using full command
python main.py list CHANNEL_URL CHANNEL_NAME
# using alias
python main.py l CHANNEL_URL CHANNEL_NAME
# using full command
python main.py transcribe_channel CHANNEL_URL CHANNEL_NAME --model MODEL_NAME
# using alias
python main.py tc CHANNEL_URL CHANNEL_NAME -m MODEL_NAME
# using full command
python main.py transcribe_video VIDEO_URL --model MODEL_NAME
# using alias
python main.py tv VIDEO_URL -m MODEL_NAME
The Whisper 'base.en' model is defined as the default. You can specify a different model using the --model option.
Supported --model values
- English:
[tiny.en, base.en, small.en, medium.en]
- Multi-lingual:
[tiny, base, small, medium, large.v1, large.v2]
Note:
the first time you run this, the specified model is downloaded - which for small models is fairly quick. The downloaded model is stored locally can be accessed by subsequent runs without downloading.
The project is organized as follows:
main.py
: The entry point of the application.transcribe-yt.py
: This script contains the logic to transcribe YouTube videos.data/input/
: This directory stores the audio files downloaded from YouTube.data/output/
: This directory stores the transcriptions generated by the tool.
The pytest framework is used for running tests on the project. To execute the tests, run the following command:
pytest
Contributions are welcome! Please read our Contributing Guide and our Code of Conduct for more information.
This project is licensed under the MIT License.