The purpose of this repository is to discuss on Audio transformers
Resource:
- https://til.simonwillison.net/machinelearning/musicgen
- Pop2piano https://huggingface.co/docs/transformers/model_doc/pop2piano
- audiolm-pytorch https://github.com/lucidrains/audiolm-pytorch
- Music-Genre-Classification https://github.com/yadgire7/Music-Genre-Classification
- Samples https://github.com/julianstefinovic/thesis-audio-examples
- audio-quality-assessment https://github.com/ashutoshc8101/audio-quality-assessment/tree/main/notebooks
- Voice2Summary https://github.com/alimirash/Voice2Summary/blob/main/Voice2Summary.ipynb
- HuggingSound https://github.com/jonatasgrosman/huggingsound
- audio-instrument-classification https://github.com/qthuy2k1/audio-instrument-classification
- Is it Pop or Rock? Classify songs with Hugging Face 🤗 and Ray on Vertex AI https://medium.com/google-cloud/is-it-pop-or-rock-classify-songs-with-hugging-face-and-ray-on-vertex-ai-34b3ef1175f8
Cool Git Repos
- SeisCLIP https://github.com/sixu0/SeisCLIP/tree/main/Zero_shot
- Multimodal Argumentation Mining https://github.com/StefanoColamonaco/Multimodal-AM/blob/main/main.ipynb
- Social-IQ-2.0-Multimodal-with-Emotional-Cues https://github.com/Derekxbj/Social-IQ-2.0-Multimodal-with-Emotional-Cues
- AssemblyAI-Medical-Transcription-Analysis
- whisperX
- mini-omni
Optimum Models:
- https://huggingface.co/helenai/MIT-ast-finetuned-speech-commands-v2-ov
- https://huggingface.co/docs/optimum/intel/inference#export-and-inference-of-stable-diffusion-models
- https://huggingface.co/blog/fine-tune-w2v2-bert
- https://huggingface.co/nyrahealth/CrisperWhisper
Speaker-Diarization
- https://huggingface.co/spaces/vumichien/Whisper_speaker_diarization/blob/main/app.py
- https://medium.com/@pierre_guillou/speech-to-text-get-transcription-with-speakers-from-large-audio-file-in-any-language-openai-8da2312f1617
- https://github.com/piegu/language-models/blob/master/speech_to_text_transcription_with_speakers_Whisper_Transcription_%2B_NeMo_Diarization.ipynb
- https://docs.openvino.ai/2023.3/notebooks/212-pyannote-speaker-diarization-with-output.html [CrisperWhisper is an advanced variant of OpenAI's Whisper, designed for fast, precise, and verbatim speech recognition with accurate (crisp) word-level timestamps]
Resources:
Convert Text to Speech with Python voice-chat-with-mistral
Notebooks Using LangChain Agents to Build a Music Recommendation System
Video Chat with Multimodal RAG on Jetson
Langchain-Voice webrtc-ai-voice-chat
Image-audio-captioning-with-ai
Local Voice Chatbot: Ollama + HF Transformers + Coqui TTS Toolkit--June
Blog":