Clipper is a Python script that processes video files by clipping segments based on keywords found in accompanying subtitle files. The script generates various outputs including clipped video, audio, subtitles, a frame image, and metadata.
- Clips video files based on keywords found in subtitle files.
- Configurable pre and post duration for clips.
- Generates a new folder with clipped video files.
- Outputs include:
- Clipped video (
.mp4
) - Frame image (
.png
) - Audio (
.mp3
) - Subtitles (
.srt
) - Metadata (
.json
)
- Clipped video (
- Process all .mp4 files in the input folder.
- Each .mp4 file should have a corresponding .srt file with the same name.
- If the .srt file is missing, log a warning and skip the video file.
- Search for specified keywords in the .srt files.
- If a keyword is found, extract a clip from the video.
- Search of keyword should use a fuzzy search matching and provide for a configurable match threshold.
- The clip should start at a specified number of seconds before the keyword's timestamp.
- The clip should end at a specified number of seconds after the keyword's timestamp.
- Sanitize filenames to remove non-ASCII characters and replace spaces with underscores.
- Extract a frame from the video at the keyword's timestamp.
- Extract audio from the video for the duration of the clip.
- Extract subtitles from the .srt file for the duration of the clip.
- Save the extracted clip, frame, audio, and subtitles to the output folder.
- Save metadata about the clip to a .json file.
- Save the center point and keyword to a .txt file.
- Log errors, info and warnings encountered during processing.
- Accept command-line arguments for input folder, output folder, keywords, pre-duration, and post-duration.
- Create the output folder if it does not exist.
- Verify the source .mp4 and .srt duration in seconds are roughly equal.
- Search for possessive forms of proper nouns such as Names. For example, if the keyword is "John", also search for "John's".
- Search for Present participle forms of verbs. For example, if the keyword is "run", also search for "running".
- Search for past tense forms of verbs and past tense forms of participles. Example, if the keyword is "fail", then also search for "failed".
It is recommended to use a Python virtual environment to manage dependencies and avoid conflicts with other projects. Follow these steps to set up and activate a virtual environment:
-
Create a virtual environment:
python -m venv .venv
-
Activate the virtual environment:
-
On Windows:
.\.venv\Scripts\activate
-
On macOS and Linux:
source .venv/bin/activate
-
-
Install the required packages:
- Python 3.x
ffmpeg
(must be installed and available in the system PATH)fuzzywuzzy
(for fuzzy keyword matching)python-Levenshtein
(optional, for improved performance offuzzywuzzy
)
Install the required packages using pip and the provided requirements.txt
file.
```sh
pip install -r requirements.txt
```
-
Run the script:
python clipper.py
python clipper.py
# Example usage
input_folder = './mediaSource'
output_folder = './mediaClipOutput'
keywords = ['example', 'keyword']
pre_duration = 10 # seconds
post_duration = 10 # seconds
process_videos(input_folder, output_folder, keywords, pre_duration, post_duration)
-
Clipping Highlights from Videos:
- Extract highlights from sports events or lectures based on specific keywords.
-
Creating Video Summaries:
- Generate summaries of long videos by clipping segments around important keywords.
-
Content Creation:
- Create short clips for social media by extracting segments around trending keywords.
The output folder will contain files named based on the original video file name and the keyword used to find the center point of the clip along with the centerpoint timestamp. For example:
mediaClipOutput/keyword1/
├── video1_example_00-01-23-456.mp4
├── video1_example_00-01-23-456.jpg
├── video1_example_00-01-23-456.mp3
├── video1_example_00-01-23-456.srt
└── video1_example_00-01-23-456.json
The .json
metadata file includes information about the clip:
{
"center_point": "00:01:23,456",
"keyword": "example",
"start_time": "0:01:13",
"duration": 20
}
This project is licensed under the MIT License.
© 2024 Matt Ladewig