Skip to content

soham97/ADIFF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ADIFF: Explaining audio difference using natural language

[Paper] [Checkpoint]

This repository hosts the Audio Difference Explanation datasets and ADIFF checkpoint. ADIFF is an audio prefix tuning-based language model with a cross-projection module and undergoes a three-step training process. ADIFF takes two audios and text prompt as input and produces different tiers of difference explanations as output. This involves identifying and describing audio events, acoustic scenes, signal characteristics, and their emotional impact on listeners. alt text

Setup

  1. Install the required dependencies: pip install -r requirements.txt. For conda, run the following:
cd adiff && \
conda create -n adiff python=3.8 && \
conda activate adiff && \
pip install -r requirements.txt
  1. Download ADIFF weights: Pretrained Model [Zenodo]
  2. Move the adiff_base.pth under config folder

Usage

The wrapper class allows easy interaction with the model. To use the wrapper, inputs required are:

  • config: The option supported is "base"
  • model_path: Choose between adiff_base.ckpt or adiff_base_wavcaps.ckpt. The second checkpoint is trained on wavcaps difference along with ACD and CLD, can detect similarities between two audios, and has wider coverage of concepts.
  • examples: List of examples. Each example is a list containing three entries: audiopath1, audiopath2, prompt

Supported functions:

  • generate: Produces text response for the given audio inputs and text prompt

Generate difference explanation

from wrapper import ADIFF

adiff = ADIFF(config="<choice of config>",model_path="<model weights")

examples = [
        ["<path1>", "<path2>", "explain the difference between the two audio in detail"],
        ["<path1>", "<path2>", "explain the difference between the two audio in one extended sentence"],
        ["<path1>", "<path2>", "explain the difference between the two audio in few words"],
    ]

response = adiff.generate(examples=examples, 
                            max_len=300, 
                            temperature=1.0
                            )

Generate audio captions

from wrapper import ADIFF

adiff = ADIFF(config="<choice of config>",model_path="<model weights")

examples = [
        ["<path1>", "<path2>", "caption the first audio"],
        ["<path1>", "<path2>", "caption the second audio"],
        ["<path1>", "<path2>", "caption both the audios"],
    ]

response = adiff.generate(examples=examples, 
                            max_len=300, 
                            temperature=1.0
                            )

Dataset

The ACD and CLD dataset sources audio files from Clotho and AudioCaps dataset. For the three tiers of difference annotation, the .csv are located under data folder

.
├── ...
├── data              
│   ├── ACD         # AudioCaps Difference Explanation
|       ├── acd_test_adiff_fewwords_answer.csv
|       ├── acd_test_adiff_sentence_answer.csv
|       ├── acd_test_adiff_detail_answer.csv
|       ├── ...
│   ├── CLD         # Clotho Difference Explanation
|       ├── cld_evaluation_adiff_fewwords_answer.csv
|       ├── cld_evaluation_adiff_sentence_answer.csv
|       ├── cld_evaluation_adiff_detail_answer.csv
|       ├── ...
└── ...

The audio files can be downloaded from their respective hosting website: Clotho and AudioCaps.

Leaderboard

Please create a PR to add any model to the leaderboard

Model Decoding CLD-1 (SPICE) CLD-2 (SPICE) CLD-3 (SPICE) ACD-1 (SPICE) ACD-2 (SPICE) ACD-3 (SPICE)
ADIFF Greedy 11.85 23.15 16.67 12.68 22.16 17.07

Citation

@inproceedings{
    anonymous2025adiff,
    title={{ADIFF}: Explaining audio difference using natural language},
    author={Anonymous},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=l4fMj4Vnly}
}

About

Explaining audio differences using language

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages