Some questions about the details of AST. #81

TungyuYoung · 2022-09-17T14:45:15Z

I would like to know how to explain the classification of audio that can be achieved using ImageNet pretrained models based on spectrograms? As we all know, most of the pictures included in Imagenet are common photos of daily life, such as cats, dogs, cars, etc. Are the features of these pictures/objects correlated with the audio spectrogram? Why can the knowledge learned from traditional pictures be distilled into the classification of spectrograms?

I would appreciate it if you could answer my questions.

YuanGongND · 2022-10-09T04:23:12Z

Hi there,

This is an interesting question but I don't have a clear answer. It is worth note that using IN pretraining for audio tasks is not new for AST, but can be trace back to 2014.

-Yuan

YuanGongND added the question Further information is requested label Oct 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about the details of AST. #81

Some questions about the details of AST. #81

TungyuYoung commented Sep 17, 2022

YuanGongND commented Oct 9, 2022

Some questions about the details of AST. #81

Some questions about the details of AST. #81

Comments

TungyuYoung commented Sep 17, 2022

YuanGongND commented Oct 9, 2022