About code"100-106" from dataloader.py #88

poult-lab · 2022-12-05T06:44:08Z

Dear Minster. Gong
Thanks a lot for your pioneering work in the field of audio processing, and warmhearted comments every time.
I have a question about using MixUp method in AST. Since I saw the code 102 from dataloader.py waveform = waveform - waveform.mean().
My question is why the waveform() need to be subtracted the mean of waveform(). That operation of subtracting from either the original MixUp or there is reason behind it?

The text was updated successfully, but these errors were encountered:

YuanGongND · 2022-12-05T07:00:41Z

Hi there,

Thanks for reaching out.

I think waveform mean substraction is not related to mixup. Substracting the mean of the waveform is a quite commonly used method to remove the DC offset. The way that I do it before mixup is just for safe. I haven't conduct experiment on the impact of waveform mean substraction, but I guess the impact is minor as we do another normalization on the spectrogram afterwards. My guess is, if your training and test use a consistent dataloader, removing waveform = waveform - waveform.mean() would be fine. But since it is quite standard, I'd prefer to keep it there.

-Yuan

poult-lab · 2022-12-05T07:39:08Z

Thank you so much gentlemen.

poult-lab · 2022-12-06T00:55:46Z

Dear Minster. Gong
Sorry to bother you again, I saw you use z-transformer normalization fbank = (fbank - self.norm_mean) / (self.norm_std * 2), and then mean and std are 0 and 0.5, respectively.
According to I know, the general z-transformer normalization is fbank = (fbank - self.norm_mean) / (self.norm_std ), mean and std are 0 and 1.
My question is, do you choose the former according to the experiment or there is a reason behind it?

YuanGongND · 2022-12-06T04:10:15Z

I think I answered this in a previous issue (see here).

You are exactly correct on that fbank = (fbank - self.norm_mean) / (self.norm_std ) is the standard method. But in my preliminary experiment, I found restricting the input to a smaller variance leads to a minor performance improvement when ImageNet pretraining is used. My guess, at that time, was that audio spectrogram's distribution is different from RGB image. However, in my follow-up experiments, I found the impact is very small. So you can use either one.

Again, I want to emphasize that - though it doesn't matter which to use, it is important to keep it consistent in training and inference, specifically, if you want to use our pretrained model, please stick to our dataloader without any change.

Finally, I recommand to first run our original code and see if you can reproduce our claimed results, if yes, then you can play with the model with various settings.

-Yuan

boschhd · 2023-05-05T12:51:05Z

Dear Yuan,

thank you for the great code repository and for maintaining it! I have a small follow up question regarding the waveform normalization (waveform = waveform - waveform.mean()) in dataloader.py and its absence in predict.py.

The dataloader is used for generating the normalization stats and training and it normalizes the waveform before the transition into the frequency domain. The predict code however loads the audio itself and has no waveform normalization.
Do you think that makes a difference?

Harald

YuanGongND · 2023-05-05T19:19:28Z

@boschhd

hi Harald,

This is not intentional, https://github.com/YuanGongND/ast/blob/master/egs/audioset/inference.py is not authored by myself but from the community, I would have added waveform = waveform - waveform.mean() if I were the author.

Having that said, this is a minor thing and just remove the DC constant, if you check waveform.mean(), it is usually a small value.

The thing really makes big difference is the spectrogram normalization at

ast/src/dataloader.py

Line 202 in 9e3bd99

fbank = (fbank - self.norm_mean) / (self.norm_std * 2)

.

Without DC removal, the code probably still runs well, without the fbank norm, the inference is almost sure to fail.

Finally, I recommend to use https://colab.research.google.com/github/YuanGongND/ast/blob/master/colab/AST_Inference_Demo.ipynb for inference instead of inference.py. That is authored by myself and provide more functions (e.g., attention map).

-Yuan

YuanGongND added the question Further information is requested label Dec 5, 2022

poult-lab closed this as completed Dec 5, 2022

poult-lab reopened this Dec 6, 2022

poult-lab closed this as completed Dec 6, 2022

poult-lab reopened this Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About code"100-106" from dataloader.py #88

About code"100-106" from dataloader.py #88

poult-lab commented Dec 5, 2022

YuanGongND commented Dec 5, 2022 •

edited

Loading

poult-lab commented Dec 5, 2022

poult-lab commented Dec 6, 2022

YuanGongND commented Dec 6, 2022 •

edited

Loading

boschhd commented May 5, 2023

YuanGongND commented May 5, 2023 •

edited

Loading

About code"100-106" from dataloader.py #88

About code"100-106" from dataloader.py #88

Comments

poult-lab commented Dec 5, 2022

YuanGongND commented Dec 5, 2022 • edited Loading

poult-lab commented Dec 5, 2022

poult-lab commented Dec 6, 2022

YuanGongND commented Dec 6, 2022 • edited Loading

boschhd commented May 5, 2023

YuanGongND commented May 5, 2023 • edited Loading

YuanGongND commented Dec 5, 2022 •

edited

Loading

YuanGongND commented Dec 6, 2022 •

edited

Loading

YuanGongND commented May 5, 2023 •

edited

Loading