English | 简体中文
OMG is a tool for processing meeting recordings (or screen captures), extracting key frames, and generating transcripts. It features a lightweight modern interface built with Gradio.
- Two input modes:
- Video file upload: Process pre-recorded videos
- Screen capture: Capture and process screen content in real-time
- Extract key frames based on content similarity
- Generate transcripts using OpenAI's Whisper models
- Support manual editing of extracted frames and transcript before export
- Export results as PDF, audio, and text files
- Lightweight modern interface built with Gradio
- Extreme compression, successfully reducing a 2.48GB video to just 321MB
- Python>=3.8 (tested on 3.12)
- CUDA-compatible GPU (optional, for faster processing)
- FFmpeg (for audio processing)
- Clone the repository:
git clone https://github.com/ZhenghaoYang19/omg.git
cd omg
-
Create virtual environment and install dependencies:
# Install uv (or download from https://github.com/astral-sh/uv) pip install uv # Create and activate virtual environment uv venv .venv\Scripts\activate # On Windows source .venv/bin/activate # On Linux/macOS # Install dependencies uv pip install -r requirements.txt
pip install -r requirements.txt
-
Install FFmpeg:
- Windows: Download from FFmpeg website
- Linux:
sudo apt-get install ffmpeg
- macOS:
brew install ffmpeg
- Start the web interface:
python app.py
-
Open your browser and navigate to
http://localhost:7860
-
Choose your input mode:
- Upload your video file
- Adjust configuration settings (optional)
- Click "Process Video"
- Wait for processing to complete
- Switch to "Results & Export" tab to view results
- Select capture type (Monitor or Window)
- For Monitor capture: Choose the monitor from the dropdown
- For Window capture: Enter the window title (or part of it)(e.g. chrome, edge, TencentMeeting)
- Adjust configuration settings (optional)
- Click "Start Capture"
- Click "Stop Capture" when finished
- Switch to "Results & Export" tab to view results
-
Export results:
- Select desired export options (PDF/Audio/Transcript)
- Click "Export Selected Files"
- Download the exported files
You can adjust the following parameters:
- Similarity Threshold (0.0-1.0): Controls how different frames need to be to be considered key frames, suggested value between 0.6 and 0.7
- Frames Per Second: Number of frames to sample per second, suggested value between 0.2 and 5
- Start/End Time (Video upload only): Process only a specific portion of the video
- ASR Model: Choose between different Whisper models
- ASR Device: Select processing device (suggested: auto)
- Frame Comparison Method: Choose between different frame comparison algorithms
Default values can be modified in config.json
.
omg/
├── app.py # Web interface
├── omg.py # Core processing logic
├── utils/
│ ├── compare.py # Frame comparison functions
│ └── images2pdf.py # PDF generation
├── config.json # Configuration file
├── requirements.txt # Python dependencies
└── output/ # Processing results
For video upload:
output/
└── video_name/
├── images/ # Extracted frames
├── audio.wav # Extracted audio
├── transcript.txt # Generated transcript
└── slides.pdf # Generated PDF (optional)
For screen capture:
output/
└── screen_capture/
└── YYYYMMDD_HHMMSS/ # Timestamp of capture
├── images/ # Captured frames
├── audio.wav # Recorded audio
├── transcript.txt # Generated transcript
└── slides.pdf # Generated PDF (optional)
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- wudududu/extract-video-ppt for the inspiration and reference
- Gradio for the web interface
- OpenAI for the Whisper ASR model