Can AST be used for audio representation towards solving the frame-level classification tasks? #90

SylviaZiyaZhou · 2022-12-26T03:06:29Z

Hi Yuan,

I am currently reading your wonderful papers about the AST and SSAST. I wonder if the AST can be used to extract frame-level representation of audio (like music) to solve the frame-level classification tasks? Thanks.

YuanGongND · 2022-12-28T23:51:47Z

Hi there,

I wonder if the AST can be used to extract frame-level representation of audio ...

Yes, technically both AST and SSAST can, but some pretraining is needed for good performance. Since AST only support patch-level pretraining, please try SSAST, see this issue for how to do it.

(like music) to solve the frame-level classification tasks? Thanks.

I am not sure about this. From our clip-level classification results (shown in SSAST paper), for general audio, patch-level SSAST is better than frame-level SSAST. But I haven't test specifically for music, it might work as music also has discrete frequency patterns like speech.

-Yuan

SylviaZiyaZhou · 2023-03-05T14:49:20Z

Hi Yuan, and thanks for your reply and I am trying to finetune the SSAST on custom data and it works. I wonder if there are AST models pretrained on ImageNet? I just want to compare its performance with ViT pretrained on ImageNet on my own tasks.

SylviaZiyaZhou · 2024-05-29T10:50:29Z

Hi Yuan, Glad to help. I check another email address ***@***.*** more often. If I do not reply immediately here, perhaps you can contact me via the ust email. Bests, Ziya

…

________________________________ From: Waseem Randhawa ***@***.***> Sent: Thursday, May 23, 2024 19:46 To: YuanGongND/ast ***@***.***> Cc: SylviaZiyaZhou ***@***.***>; Mention ***@***.***> Subject: Re: [YuanGongND/ast] Can AST be used for audio representation towards solving the frame-level classification tasks? (Issue #90) @SylviaZiyaZhou<https://github.com/SylviaZiyaZhou> can you please contact me on my ***@***.******@***.***>) or please share your email I wanted train the AST for music chord recognition. I needed a little bit guidance. — Reply to this email directly, view it on GitHub<#90 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AL4FHIH6FNPC75IIRYSIAHTZDXJHXAVCNFSM6AAAAAATJKA4XOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWHEYDMNRVGY>. You are receiving this because you were mentioned.Message ID: ***@***.***>

SylviaZiyaZhou · 2024-05-29T10:52:53Z

Sorry for the email mistakenly sent to you. Please just ignore it. Thanks! Bests, Ziya

…

________________________________ From: Ziya Zhou ***@***.***> Sent: Wednesday, May 29, 2024 18:50 To: YuanGongND/ast ***@***.***> Subject: Re: [YuanGongND/ast] Can AST be used for audio representation towards solving the frame-level classification tasks? (Issue #90) Hi Yuan, Glad to help. I check another email address ***@***.*** more often. If I do not reply immediately here, perhaps you can contact me via the ust email. Bests, Ziya

________________________________ From: Waseem Randhawa ***@***.***> Sent: Thursday, May 23, 2024 19:46 To: YuanGongND/ast ***@***.***> Cc: SylviaZiyaZhou ***@***.***>; Mention ***@***.***> Subject: Re: [YuanGongND/ast] Can AST be used for audio representation towards solving the frame-level classification tasks? (Issue #90) @SylviaZiyaZhou<https://github.com/SylviaZiyaZhou> can you please contact me on my ***@***.******@***.***>) or please share your email I wanted train the AST for music chord recognition. I needed a little bit guidance. — Reply to this email directly, view it on GitHub<#90 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AL4FHIH6FNPC75IIRYSIAHTZDXJHXAVCNFSM6AAAAAATJKA4XOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWHEYDMNRVGY>. You are receiving this because you were mentioned.Message ID: ***@***.***>

YuanGongND added the question Further information is requested label Dec 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can AST be used for audio representation towards solving the frame-level classification tasks? #90

Can AST be used for audio representation towards solving the frame-level classification tasks? #90

SylviaZiyaZhou commented Dec 26, 2022

YuanGongND commented Dec 28, 2022

SylviaZiyaZhou commented Mar 5, 2023

SylviaZiyaZhou commented May 29, 2024 via email

SylviaZiyaZhou commented May 29, 2024 via email

Can AST be used for audio representation towards solving the frame-level classification tasks? #90

Can AST be used for audio representation towards solving the frame-level classification tasks? #90

Comments

SylviaZiyaZhou commented Dec 26, 2022

YuanGongND commented Dec 28, 2022

SylviaZiyaZhou commented Mar 5, 2023

SylviaZiyaZhou commented May 29, 2024 via email

SylviaZiyaZhou commented May 29, 2024 via email