Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can AST be used for audio representation towards solving the frame-level classification tasks? #90

Open
SylviaZiyaZhou opened this issue Dec 26, 2022 · 4 comments
Labels
question Further information is requested

Comments

@SylviaZiyaZhou
Copy link

Hi Yuan,

I am currently reading your wonderful papers about the AST and SSAST. I wonder if the AST can be used to extract frame-level representation of audio (like music) to solve the frame-level classification tasks? Thanks.

@YuanGongND YuanGongND added the question Further information is requested label Dec 28, 2022
@YuanGongND
Copy link
Owner

Hi there,

I wonder if the AST can be used to extract frame-level representation of audio ...

Yes, technically both AST and SSAST can, but some pretraining is needed for good performance. Since AST only support patch-level pretraining, please try SSAST, see this issue for how to do it.

(like music) to solve the frame-level classification tasks? Thanks.

I am not sure about this. From our clip-level classification results (shown in SSAST paper), for general audio, patch-level SSAST is better than frame-level SSAST. But I haven't test specifically for music, it might work as music also has discrete frequency patterns like speech.

-Yuan

@SylviaZiyaZhou
Copy link
Author

Hi Yuan, and thanks for your reply and I am trying to finetune the SSAST on custom data and it works. I wonder if there are AST models pretrained on ImageNet? I just want to compare its performance with ViT pretrained on ImageNet on my own tasks.

@SylviaZiyaZhou
Copy link
Author

SylviaZiyaZhou commented May 29, 2024 via email

@SylviaZiyaZhou
Copy link
Author

SylviaZiyaZhou commented May 29, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants