Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support of pyannote segmentation 3.0 (experience sharing) #262

Open
ywangwxd opened this issue Dec 20, 2024 · 3 comments
Open

add support of pyannote segmentation 3.0 (experience sharing) #262

ywangwxd opened this issue Dec 20, 2024 · 3 comments

Comments

@ywangwxd
Copy link

ywangwxd commented Dec 20, 2024

Hi, if anyone like me is working on the feat/diart-asr branch and want to add support of pyannote segmentation 3.0. Here is what I have done.

You only need to make some changes in diariazation.py. Segmentaiton 3.0 output activation of seven classes, instead of three (for details please refer to the paper) . What's more the activation values are those before a softmax transformation. So I have simply added a softmax transformation and take only the activations of three speakers, ignoring the other four labels. In this way, it is almost the same to the old version of segmentation. But be aware that this will miss overlapping speech segments.

diart

@juanmc2005
Copy link
Owner

Hey @ywangwxd, just FYI you can use the Powerset class from pyannote.audio to do this without sacrificing overlapping speech.
In particular, take a look at the method called to_multilabel()

@ywangwxd
Copy link
Author

Hey @ywangwxd, just FYI you can use the Powerset class from pyannote.audio to do this without sacrificing overlapping speech. In particular, take a look at the method called to_multilabel()

Thank you, this is the benefit for my sharing. Someone else may tell me a better solution:-)

@ywangwxd
Copy link
Author

ywangwxd commented Dec 23, 2024

Hey @ywangwxd, just FYI you can use the Powerset class from pyannote.audio to do this without sacrificing overlapping speech. In particular, take a look at the method called to_multilabel()

After taking a detailed look, I switched to making changes on models.py instead of the upper level diariazation.py. Here is the screenshot of the diff. I have referred to those codes in version 0.9.1. Just to confirm that in this way, any overlapping speech will be labelled as a single speaker which has the most loudly voice, right?

diart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants