Skip to content

A Django based search engine and transcripts generator for the Skeptics Guide to the Universe (SGU) podcast.

Notifications You must be signed in to change notification settings

MarcinMG-web/sgu_transcript_generator

 
 

Repository files navigation

Python package GitHub last commit GitHub repo size GitHub GitHub top language Language grade: Python Total alerts

Transcripts Generator & Search Engine (TGSE)

An English based audio files to text converter and search engine that ensures that grammar, casing and punctuation are on the spot. An efficient search engine allows users to define a text-based query and play an audio file from the exact location where the query occurs. Among others, it's a unique tool for podcating as it makes feel you are searching through audio files like you do through texts. Currently applied to the Skeptic's Guide to the Universe Podcast.

Functionality

Submit for transcription

SubmitTranscripts

Get transcripts

GetTranscripts

Search

Functionality

Like that project?

Consider becoming a patreon by clicking https://www.patreon.com/maciejgierada

Contributions

Contributions are highly welcome! There is still a lot of work to be done!

How to run local

TGSA backend is Django based, so to run locally do:

# navigate to path where you will keep the project
cd path_to_install
# clone the repo (if you are planning to contribute, fork the repo and clone it)
git clone https://github.com/mgierada/TGSE.git
# enter the repo's root directory
cd TGSE
# create a virtual environment
python3 -m venv sgu-tse_venv
# activate the environment
source sgu-tse_venv/bin/activate
# upgrade pip
python3 -m pip install --upgrade pip
# install sgu-tse
python3 -m pip install -r requirements.txt
# run local server
python3 manage.py runserver
# open browser at http://127.0.0.1:8000/

REST API

It is not my main goal to have a nice REST API at this moment, however, there are a couple of enpoints you can access. More will come later:

endpoint feature method
episodes/ get details of all episodes GET
episodes/<int:episode_number>/ get details of a given episode GET

Wish List

  • better design
  • set up an event listiner to check for new episodes, get detials, submit for transcription, get transcript and populate DB in automated fashion
  • use timestaps to navigate to the exact moment in the audio file matching the query
  • better transcripts quality
  • improved search-engine by implementing a method to search for an almost exact match
  • refactoring
  • documentation

Tech Stack

  • Python
  • HTML/CSS
  • JavaScript
  • Django
  • PostgreSQL
  • Selenium
  • Assemblyai
  • Haystack
  • Heroku
  • CI/CD pipelines

About

A Django based search engine and transcripts generator for the Skeptics Guide to the Universe (SGU) podcast.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 56.8%
  • HTML 20.6%
  • CSS 20.4%
  • JavaScript 2.2%