-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can AST be used for audio representation towards solving the frame-level classification tasks? #90
Comments
Hi there,
Yes, technically both AST and SSAST can, but some pretraining is needed for good performance. Since AST only support patch-level pretraining, please try SSAST, see this issue for how to do it.
I am not sure about this. From our clip-level classification results (shown in SSAST paper), for general audio, patch-level SSAST is better than frame-level SSAST. But I haven't test specifically for music, it might work as music also has discrete frequency patterns like speech. -Yuan |
Hi Yuan, and thanks for your reply and I am trying to finetune the SSAST on custom data and it works. I wonder if there are AST models pretrained on ImageNet? I just want to compare its performance with ViT pretrained on ImageNet on my own tasks. |
Hi Yuan,
Glad to help. I check another email address ***@***.*** more often. If I do not reply immediately here, perhaps you can contact me via the ust email.
Bests,
Ziya
…________________________________
From: Waseem Randhawa ***@***.***>
Sent: Thursday, May 23, 2024 19:46
To: YuanGongND/ast ***@***.***>
Cc: SylviaZiyaZhou ***@***.***>; Mention ***@***.***>
Subject: Re: [YuanGongND/ast] Can AST be used for audio representation towards solving the frame-level classification tasks? (Issue #90)
@SylviaZiyaZhou<https://github.com/SylviaZiyaZhou> can you please contact me on my ***@***.******@***.***>) or please share your email I wanted train the AST for music chord recognition. I needed a little bit guidance.
—
Reply to this email directly, view it on GitHub<#90 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AL4FHIH6FNPC75IIRYSIAHTZDXJHXAVCNFSM6AAAAAATJKA4XOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWHEYDMNRVGY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Sorry for the email mistakenly sent to you. Please just ignore it. Thanks!
Bests,
Ziya
…________________________________
From: Ziya Zhou ***@***.***>
Sent: Wednesday, May 29, 2024 18:50
To: YuanGongND/ast ***@***.***>
Subject: Re: [YuanGongND/ast] Can AST be used for audio representation towards solving the frame-level classification tasks? (Issue #90)
Hi Yuan,
Glad to help. I check another email address ***@***.*** more often. If I do not reply immediately here, perhaps you can contact me via the ust email.
Bests,
Ziya
________________________________
From: Waseem Randhawa ***@***.***>
Sent: Thursday, May 23, 2024 19:46
To: YuanGongND/ast ***@***.***>
Cc: SylviaZiyaZhou ***@***.***>; Mention ***@***.***>
Subject: Re: [YuanGongND/ast] Can AST be used for audio representation towards solving the frame-level classification tasks? (Issue #90)
@SylviaZiyaZhou<https://github.com/SylviaZiyaZhou> can you please contact me on my ***@***.******@***.***>) or please share your email I wanted train the AST for music chord recognition. I needed a little bit guidance.
—
Reply to this email directly, view it on GitHub<#90 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AL4FHIH6FNPC75IIRYSIAHTZDXJHXAVCNFSM6AAAAAATJKA4XOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWHEYDMNRVGY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hi Yuan,
I am currently reading your wonderful papers about the AST and SSAST. I wonder if the AST can be used to extract frame-level representation of audio (like music) to solve the frame-level classification tasks? Thanks.
The text was updated successfully, but these errors were encountered: