You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to know how to explain the classification of audio that can be achieved using ImageNet pretrained models based on spectrograms? As we all know, most of the pictures included in Imagenet are common photos of daily life, such as cats, dogs, cars, etc. Are the features of these pictures/objects correlated with the audio spectrogram? Why can the knowledge learned from traditional pictures be distilled into the classification of spectrograms?
I would appreciate it if you could answer my questions.
The text was updated successfully, but these errors were encountered:
This is an interesting question but I don't have a clear answer. It is worth note that using IN pretraining for audio tasks is not new for AST, but can be trace back to 2014.
I would like to know how to explain the classification of audio that can be achieved using ImageNet pretrained models based on spectrograms? As we all know, most of the pictures included in Imagenet are common photos of daily life, such as cats, dogs, cars, etc. Are the features of these pictures/objects correlated with the audio spectrogram? Why can the knowledge learned from traditional pictures be distilled into the classification of spectrograms?
I would appreciate it if you could answer my questions.
The text was updated successfully, but these errors were encountered: