A real-time speech-to-text transcription system for Telugu using a pretrained ASR model.
This project uses the Harveenchadha/vakyansh-wav2vec2-telugu-tem-100
model from Hugging Face to convert Telugu speech into text in real-time. The system captures live audio through your earphone's microphone and processes it using modern deep learning libraries.
- Real-Time Transcription: Captures and transcribes Telugu speech live.
- Pretrained ASR Model: Utilizes a state-of-the-art model created by Harveenchadha.
- Interactive Development: Built and debugged using VS Code with the Jupyter extension.
- PyTorch: Deep learning framework for model computations.
- Torchaudio: Audio processing capabilities integrated with PyTorch.
- Transformers: Access to pretrained models, including our ASR model.
- Sounddevice: Captures live audio input.
- Numpy: Array operations and data manipulation.
- Scipy: Additional audio file handling and signal processing.
- Clone the Repository:
git clone https://github.com/MAvinash24/ML_Project.git
cd ML_Project
- Install Dependencies: Use pip to install the required libraries:
pip install torch torchaudio transformers sounddevice numpy scipy
-
Open VS Code: Ensure that you have installed the Python and Jupyter extensions from the VS Code Marketplace.
-
Open the Notebook: Open the .ipynb file provided in the repository. VS Code will load the interactive notebook interface.
-
Execute Cells: The notebook is divided into cells:
Installation Cell: Contains pip installation commands.
Imports and Function Definitions: Contains all the necessary Python code for audio processing.
Main Execution Cell: Contains the loop that records, processes, and prints transcriptions.
-
Run each cell one-by-one using the "Run Cell" button. This helps in verifying that each part of the code is functioning correctly.
Follow the debug messages printed in the terminal:
-[DEBUG] READY: System is ready to capture audio.
-[DEBUG] STOPPED: Audio capture has ended, and processing is beginning.
-[DEBUG] TRANSCRIPTION: The output text after processing.
If issues arise, check each cell individually to isolate and resolve the error.
Using Your Earphone as the Audio Input Device Make sure your earphone’s built-in microphone is selected as the default input device in your system settings. This setup ensures that the sounddevice library captures your speech accurately during the transcription process.
Special thanks to Harveenchadha for creating the pretrained vakyansh-wav2vec2-telugu-tem-100 model. Their contribution has been pivotal in making this project possible.