You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, if anyone like me is working on the feat/diart-asr branch and want to add support of pyannote segmentation 3.0. Here is what I have done.
You only need to make some changes in diariazation.py. Segmentaiton 3.0 output activation of seven classes, instead of three (for details please refer to the paper) . What's more the activation values are those before a softmax transformation. So I have simply added a softmax transformation and take only the activations of three speakers, ignoring the other four labels. In this way, it is almost the same to the old version of segmentation. But be aware that this will miss overlapping speech segments.
The text was updated successfully, but these errors were encountered:
Hey @ywangwxd, just FYI you can use the Powerset class from pyannote.audio to do this without sacrificing overlapping speech.
In particular, take a look at the method called to_multilabel()
Hey @ywangwxd, just FYI you can use the Powerset class from pyannote.audio to do this without sacrificing overlapping speech. In particular, take a look at the method called to_multilabel()
Thank you, this is the benefit for my sharing. Someone else may tell me a better solution:-)
Hey @ywangwxd, just FYI you can use the Powerset class from pyannote.audio to do this without sacrificing overlapping speech. In particular, take a look at the method called to_multilabel()
After taking a detailed look, I switched to making changes on models.py instead of the upper level diariazation.py. Here is the screenshot of the diff. I have referred to those codes in version 0.9.1. Just to confirm that in this way, any overlapping speech will be labelled as a single speaker which has the most loudly voice, right?
Hi, if anyone like me is working on the
feat/diart-asr
branch and want to add support of pyannote segmentation 3.0. Here is what I have done.You only need to make some changes in diariazation.py. Segmentaiton 3.0 output activation of seven classes, instead of three (for details please refer to the paper) . What's more the activation values are those before a softmax transformation. So I have simply added a softmax transformation and take only the activations of three speakers, ignoring the other four labels. In this way, it is almost the same to the old version of segmentation. But be aware that this will miss overlapping speech segments.
The text was updated successfully, but these errors were encountered: