This project is a plugin for automatic subtitling in BigBlueButton (BBB), an open source web conferencing system. bbb-live-subtitles will run real time automatic speech recognition (ASR) and will generate subtitle captions on-the-fly. No cloud services are used for ASR, instead we use our own speech recognition models that can be run locally. This ensures that no privacy issues arise. There are pre-trained German and English models available and ready to use (English needs PyKaldi > 0.2.0 and Python 3.8).
See our plugin in action in this presentation video.
Currently, each BBB participant is subtitled individually. We use Kaldi/pyKaldi for automatic speech recognition (ASR). Any nnet3 compatible Kaldi model can be used. We offer free and ready to use models for German ASR and an English model is available as well. Also the subtitles are written into the Shared Notes and can be exported as PDF or a Word Document.
Tested with BigBlueButton 2.2.x, Ubuntu 20.04, Python 3.8 and kaldi-model-server
# Make sure you have Python 3.8 installed, its dev package and other dependencies: (Python 3.7 could work also. There is a PyKaldi wheel for 3.7 also)
sudo apt-get install python3.8 python3.8-dev portaudio19-dev virtualenv automake autoconf sox gfortran libtool subversion
# Now clone the bbb-live-subtitles package somewhere:
mkdir ~/projects
cd ~/projects
git clone https://github.com/uhh-lt/bbb-live-subtitles
cd bbb-live-subtitles/
# create the virtual environment and install the dependencies
virtualenv -p /usr/bin/python3.8 bbbsub_env
source bbbsub_env/bin/activate
pip install redis pymongo jaspion pyyaml pyaudio samplerate scipy pyetherpadlite websockets
# Clone Kaldi-Model-Server
git clone https://github.com/uhh-lt/kaldi-model-server.git
# Install PyKaldi
# Download the PyKaldi wheel and install it
# PyKaldi 0.2.0:
wget http://ltdata1.informatik.uni-hamburg.de/pykaldi_tmp/pykaldi-0.2.1-cp38-cp38-linux_x86_64.whl
pip install pykaldi-0.2.1-cp38-cp38-linux_x86_64.whl
# Install Kaldi and Intel MKL (see note below if you have a different CPU than Intel)
./install_mkl.sh
./install_kaldi_intel.sh ~/projects/bbb-live-subtitles/bbbsub_env/bin/python3.8
# OR if you have a non-Intel CPU:
./install_kaldi.sh ~/projects/bbb-live-subtitles/bbbsub_env/bin/python3.8
# Download the english and german Model
cd kaldi-model-server
./download_example_models.sh
When working with different machines (see Section "usage") the configuration of the redis server must be changed to allow remote access, as outlined below.
If you like to host the speech recognition on another server in your local network, you need to allow connections to redis in your local network. When all scripts run on the same machine you can skip this step
Open the redis configuration file in an editor
sudo nano /etc/redis/redis.conf
Change the line bind 127.0.0.1
and add the local IP Adress of the server(eg. 192.168.0.1): bind 127.0.0.1 192.168.0.1
.
Save the file and restart the redis server
sudo /etc/init.d/redis-server restart
This would bind redis to the local IP Adress 192.168.0.1 as well as 127.0.0.1. You can now test the access from another machine within your network with redis-cli
for example:
redis-cli -h 192.168.0.1 -p 6379
Note that you shouldn't let Redis listen on a public IP, as you would otherwise expose raw speech data packages and other data to the public. If in doubt consult your admin about network and firewall settings and make sure that Redis can only be accessed from trusted hosts.
When using iptables as your firewall you can add a rule like this:
iptables -A INPUT -p tcp --destination-port 6379 -m iprange --src-range 192.168.0.1-192.168.0.254 -j ACCEPT
This example opens the Port 6379 for TCP connections to a range of IP adresses between 192.168.0.1 and 192.168.0.254.
To fork the audio we use the mod_audio_fork Plugin. This plugin needs libwebsockets to be installed.
We compiled already a compatible version so you only need to run the script with sudo
to download and install it:
sudo ./install_audio_fork.sh
Open the modules file with your favorite editor:
sudo nano /opt/freeswitch/etc/freeswitch/autoload_configs/modules.conf.xml
Add the following at the end of the file but above </modules>
:
<load module="mod_audio_fork"/>
Save the changes and restart freeswitch:
sudo service freeswitch restart
To use the script you need to add your Server and API Keys.
Add in line 9 your BBB hostname (or localhost).
Add in line 14 your Freeswitch password.
The password is stored in the file /opt/freeswitch/etc/freeswitch/autoload_configs/event_socket.conf.xml
Add in line 62 your BBB hostname (or localhost).
To run the mongodbconnector you need to get the Etherpad API Key.
The Key is stored in /usr/share/etherpad-lite/APIKEY.txt
To use this project you can run every script on remote machines or if your machine is fast enough all services on the same machine. All the scripts need to run before the participants join the conference. ASR processing is only loaded once it is needed, otherwise the script stand by and wait for participants to join conferences. After the participant leaves the conference or leaves the audio of the conference the ASR stops. When the participant joins back the audio with microphone the ASR starts.
To run multiple Shells it is recommended to use tmux.
At first you need to start esl_to_redis.py
. This module creates a connection to the FreeSWITCH Software through ESL. It checks for new partitipants with audio and sends a message into the redis information channel.
The next to start is ws_receiver.py
. This module starts a webSocket server and waits for incoming data. If a stream starts it copys the audiostream to a new REDIS-Channel.
The kaldi_starter.py
module is in this configuration on another machine (could also run on the same machine) and starts for each media bug a seperate pykaldi instance with the Kaldi-model-server. The kaldi-model-server sends the transcrived speech back into a separate redis channel.
python3 kaldi_starter.py -s server -c asr_channel
With the mongodbconnector.py
module the ASR Data is written into the mongodb database and into the Notes. To see subtitles, the presenter needs to start the subtitle functionanilty in BBB and then the participants can activate the subtitles by clicking on the CC button.
Start mongodbconnector.py
with your parameters:
python3 mongodbconnector.py -s BBB_SERVER -c asr_channel -e ETHERPAD_API_KEY
To run all scripts except Kaldi you can also adapt and use the script run.sh
. It runs es_to_redis
, ws_receiver
and mongodbconnector
in one tmux session.
As a moderator:
Open the menu and click on "write closed captions"
Select in the dropdown menu german(Deutsch) and click Start
As a participant: Click on the CC Button on the bottom of the conference
Select as the Caption Language german (Deutsch) an click Start
Now you see the captions at the bottom of the conference
There is an issue with Mozilla Firefox and recordings in freeswitch with opus. To use this project with Firefox you need to deactivate the opus codec.
Open the modules file with your favorite editor:
sudo nano /opt/freeswitch/etc/freeswitch/autoload_configs/modules.conf.xml
The mod_opus
must be commented out:
<-- <load module="mod_opus"/>-->
save and restart freeswitch:
sudo service freeswitch restart
Feel free to write an issue, pull-request or a mail :)
If you use the plugin, please cite our paper:
@inproceedings{geislinger21_interspeech,
author={Robert Geislinger and Benjamin Milde and Timo Baumann and Chris Biemann},
title={{Live Subtitling for BigBlueButton with Open-Source Software}},
year=2021,
booktitle={Proc. Interspeech 2021},
pages={3319--3320},
address={Brno, Czech Republic}
}