WhisperX stands out as the most versatile and feature-rich Whisper variation. [1] It outperforms the original Whisper in word segmentation, word error rate (WER), and transcription speed. [2]
And you can run it serverless on AWS Lambda. 🚀
aws cloudformation deploy \
--template-file whisperx-on-lambda.yaml \
--stack-name whisperx-on-lambda \
--capabilities CAPABILITY_IAM
A sample of 20 seconds takes 6.5 seconds. If you have 1000 samples it will cost you around $1 to process them all.
Note: there are no optimizations done regarding memory and speed of processing!
base64 -i sample.mp3 > /tmp/sample.base64
aws lambda invoke --function-name whisperx-on-lambda --payload "{\"isBase64Encoded\": true, \"body\": \"$(cat /tmp/sample.base64 | tr -d '\n')\"}" --cli-binary-format raw-in-base64-out /tmp/output.json --log-type Tail --query 'LogResult' --output text | base64 -d
make build
docker run -p 9000:8080 whisperx-on-aws-lambda:latest
base64 -i sample.mp3 > sample.base64
echo "{\"isBase64Encoded\": true, \"body\": \"$(cat sample.base64 | tr -d '\n')\"}" > request.json && curl -X POST http://localhost:9000/2015-03-31/functions/function/invocations -H "Content-Type: application/json" --data-binary @request.json