Problem-4 (NLP: Panel discussion summarization)
Context:
You have been hired by a company which analyzes internet data to monetize it. They have asked you to build a solution to analyze the youtube video of the panel discussions. Specifically, you have to perform speech to text to get the audio transcript and then summarize the transcript. Technical details
Input: Youtube URL of a panel discussion
Output: Textual summary of the discussion
--> !pip install SpeechRecognition // converting from audio to text
--> !pip install youtube_dl // downloading audio file of youtube video
--> !pip install pydub // to convert the mp3 file to wav format
--> !pip install os // saving the mp3 format and to raname it or to delete it
--> !pip3 install git+https://github.com/ernie-mlg/rpunct.git. // to add punctuation
--> !pip install gensim // to summarize the whole text
--> I have divided this project in 4 phase :
- downloading audio in mp3 format.
- converting audio to text using speechrecognizer library.
- Add punctuations.
- use summarizer to summarize the whole transcript.
--> Export this file to google colab and run each shell.
--> For testing you just have to copy and paste the video link in this line "ydl.download(['https://www.youtube.com/watch?v=xb98qYIfNZ4'])".