andysingal / Audio-LLM Public

Notifications You must be signed in to change notification settings
Fork 1
Star 6

The purpose of this repository is to discuss on Audio transformers

6 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
notebooks		notebooks
projects		projects
Audio-finetune.md		Audio-finetune.md
Examples.md		Examples.md
LLM-Audio.md		LLM-Audio.md
README.md		README.md
Resources.md		Resources.md
TUTORIAL.md		TUTORIAL.md
llm-audio-resource.md		llm-audio-resource.md
models.md		models.md
special-vide-chat.md		special-vide-chat.md

Repository files navigation

Audio-LLM

The purpose of this repository is to discuss on Audio transformers

Resource:

https://til.simonwillison.net/machinelearning/musicgen
Pop2piano https://huggingface.co/docs/transformers/model_doc/pop2piano
audiolm-pytorch https://github.com/lucidrains/audiolm-pytorch
Music-Genre-Classification https://github.com/yadgire7/Music-Genre-Classification
Samples https://github.com/julianstefinovic/thesis-audio-examples
audio-quality-assessment https://github.com/ashutoshc8101/audio-quality-assessment/tree/main/notebooks
Voice2Summary https://github.com/alimirash/Voice2Summary/blob/main/Voice2Summary.ipynb
HuggingSound https://github.com/jonatasgrosman/huggingsound
audio-instrument-classification https://github.com/qthuy2k1/audio-instrument-classification
Is it Pop or Rock? Classify songs with Hugging Face 🤗 and Ray on Vertex AI https://medium.com/google-cloud/is-it-pop-or-rock-classify-songs-with-hugging-face-and-ray-on-vertex-ai-34b3ef1175f8

Cool Git Repos

SeisCLIP https://github.com/sixu0/SeisCLIP/tree/main/Zero_shot
Multimodal Argumentation Mining https://github.com/StefanoColamonaco/Multimodal-AM/blob/main/main.ipynb
Social-IQ-2.0-Multimodal-with-Emotional-Cues https://github.com/Derekxbj/Social-IQ-2.0-Multimodal-with-Emotional-Cues
AssemblyAI-Medical-Transcription-Analysis
whisperX
mini-omni

Optimum Models:

Speaker-Diarization

https://huggingface.co/spaces/vumichien/Whisper_speaker_diarization/blob/main/app.py
https://medium.com/@pierre_guillou/speech-to-text-get-transcription-with-speakers-from-large-audio-file-in-any-language-openai-8da2312f1617
https://github.com/piegu/language-models/blob/master/speech_to_text_transcription_with_speakers_Whisper_Transcription_%2B_NeMo_Diarization.ipynb
https://docs.openvino.ai/2023.3/notebooks/212-pyannote-speaker-diarization-with-output.html [CrisperWhisper is an advanced variant of OpenAI's Whisper, designed for fast, precise, and verbatim speech recognition with accurate (crisp) word-level timestamps]

Resources:

Convert Text to Speech with Python voice-chat-with-mistral

Notebooks Using LangChain Agents to Build a Music Recommendation System

Video Chat with Multimodal RAG on Jetson

Langchain-Voice webrtc-ai-voice-chat

Multinodal LLM with LangChain

Image-audio-captioning-with-ai

Local Voice Chatbot: Ollama + HF Transformers + Coqui TTS Toolkit--June

speech-to-speech

Blog":

About

The purpose of this repository is to discuss on Audio transformers

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%