Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scratching noise when converting text to speech. PCM data is definitely broken (screenshot attached). #10

Open
alexey-v-paramonov opened this issue Aug 14, 2023 · 4 comments

Comments

@alexey-v-paramonov
Copy link

Describe the bug
Scratching noise in the PCM audio. If I import PCM into Audacity - I can see 2 peaks in PCM that produce distortion.

To Reproduce
The following code:

#!/usr/bin/python3

from speechkit import Session
from speechkit import SpeechSynthesis

SPEECHKIT_API_KEY = "***"
SPEECHKIT_FOLDER_ID = "***"

voice = "filipp"
speed = "1"
emotion = "neutral"

text = open("tts.log", "r").read()
session = Session.from_api_key(SPEECHKIT_API_KEY, SPEECHKIT_FOLDER_ID)
synthesizeAudio = SpeechSynthesis(session)

pcm_buff = synthesizeAudio.synthesize_stream(
    text=text,
    lang="ru-RU",
    voice=voice,
    format='lpcm',
    speed=speed,
    emotion=emotion,
    sampleRateHertz='48000'
)
open("pcm.buff", "wb").write(pcm_buff)

Expected behavior
Clean audio

Screenshots
tts

Additional context
tts.log

Source text attached.

Python 3.10.12
speechkit==2.2.2

@tikhonp
Copy link
Owner

tikhonp commented Aug 15, 2023

Hello! Thank you for bug report.

I ran sample code and didn't notice any issue. Can you provide your pcm.buff file?

One thing that maybe you misunderstood that lpcm data from Yandex speechkit comes raw (without any wave headers) so music playing software may not understand how play audio, so to play it i saved it differently. Instead of your open("pcm.buff", "wb").write(pcm_buff), i used python standard wave lib:

import wave

# ...

with wave.open('pcm.wav', 'wb') as f:
    f.setnchannels(1) # set number of audio channels. 1 means mono
    f.setsampwidth(2) # set number of bytes per sample 
    f.setframerate(48000) # set sample rate in hertz. 48000 is sampleRateHertz passed in synthesizeAudio.synthesize_stream call
    f.writeframes(pcm_buff) # and finally write the data to file

Without this metadata file may not play correctly.
Please tell if this will help, if not i will try to investigate it further.

@alexey-v-paramonov
Copy link
Author

alexey-v-paramonov commented Aug 15, 2023

pcm.buff: https://disk.yandex.ru/d/ZeVoxVWJIcdSWw

Yes, I completely understand what is lpcm. On my screenshot - that is not actually a player, that's an audio editor. I import the data as "raw" format into that editor and the artifacts in the sound are seen clearly. So the noise is not coming from the player, it is there in the raw audio.

To confirm the problem and make sure it is not related to the audio format that I am using (raw or wav) I've also made a wav-version using your code above, here it is:
https://disk.yandex.ru/d/uA3Dxj2Lh0OksQ

the same issue persists the wav (timing: 2:08)

@tikhonp
Copy link
Owner

tikhonp commented Aug 17, 2023

Now I understand your problem, it's how Yandex generates data, I cannot fix it :(

So, the solution may be to use Yandex grpc api v3, instead of rest api that I'm using in this lib, it has loudness_normalization_type parameter which might help solve the problem.

I tried to compile grpc, but I have known issue with cgrpc and python on MacBook with M2 arm silicon, so when i have more time i will try to solve it and add this functionality to this package.

If there is such an opportunity and you manage to solve this problem, make a pull request for this library, please

@alexey-v-paramonov
Copy link
Author

Okay, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants