DiffSinger output is non-deterministic for a given score #1440

yezhiyi9670 · 2025-03-09T04:12:04Z

Acknowledgement

I have read Getting-Started and FAQ

🐛 Describe the bug

Even for identical scores, DiffSinger singers will produce different audio on different runs. This could be considered a bug since it undermines the ability for a singing voice synthesis project to produce a predictable output, making the project "unmaintainable". For example, it will not be possible to correct a flaw in the score without affecting the rest if the cache is gone.

I understand that generative deep learning models are expected to have some randomness since they are actually "sampling" from some distribution. However, the reproducibility of the audio output is very important. As far as I know, two approaches can be used to fix this:

Explicitly check relevant cache data into the project file (or into a file next to the project file), so that the cache is not volatile and can make the output reproducible. This can take up a lot of disk space, but it is worthwhile to have such an option.
Use a fixed seed (ideally changeable by the user for each musical part or even each sentence) during inference. The ONNX runtime seems to have an interface for setting a seed. To ensure reproducibility, the seed must also be written into project file.

Explains how to reproduce the bug

Select a DiffSinger-based singer.
Write a piece of score.
Load the predicted pitch curve from the model.
Play the piece to have the audio rendered. Then click Tools > Clear cache in the main window and play again to have the audio re-rendered. Repeat for several times and compare the generated audio.
❎ The audio will be different although no changes are made to the score between the runs.

For example, I am using the QiXuan_v2.5.0 and the following piece:

This phrase is taken from a well-known song which is probably already included in the training data, so I have deliberately transposed it up by 2 keys.

Here are the waveforms of 5 samples obtained from score. They are slightly different.

Actually, they do have audible differences, especially the pronunciation of "上"(shang), "说"(shuo) and "想"(xiang). In one of the samples, the "上" is almost pronounced as "sheng". Having an unpredictable possibility for such mistake is unacceptable.

OS & Version

Windows 11 版本 24H2 (OS 内部版本 26100.3323)

Logs

log20250309.txt

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DiffSinger output is non-deterministic for a given score #1440

DiffSinger output is non-deterministic for a given score #1440

yezhiyi9670 commented Mar 9, 2025

DiffSinger output is non-deterministic for a given score #1440

DiffSinger output is non-deterministic for a given score #1440

Comments

yezhiyi9670 commented Mar 9, 2025

Acknowledgement

🐛 Describe the bug

Explains how to reproduce the bug

OS & Version

Logs