Scratching noise when converting text to speech. PCM data is definitely broken (screenshot attached). #10

alexey-v-paramonov · 2023-08-14T16:40:26Z

Describe the bug
Scratching noise in the PCM audio. If I import PCM into Audacity - I can see 2 peaks in PCM that produce distortion.

To Reproduce
The following code:

#!/usr/bin/python3

from speechkit import Session
from speechkit import SpeechSynthesis

SPEECHKIT_API_KEY = "***"
SPEECHKIT_FOLDER_ID = "***"

voice = "filipp"
speed = "1"
emotion = "neutral"

text = open("tts.log", "r").read()
session = Session.from_api_key(SPEECHKIT_API_KEY, SPEECHKIT_FOLDER_ID)
synthesizeAudio = SpeechSynthesis(session)

pcm_buff = synthesizeAudio.synthesize_stream(
    text=text,
    lang="ru-RU",
    voice=voice,
    format='lpcm',
    speed=speed,
    emotion=emotion,
    sampleRateHertz='48000'
)
open("pcm.buff", "wb").write(pcm_buff)

Expected behavior
Clean audio

Screenshots

Additional context
tts.log

Source text attached.

Python 3.10.12
speechkit==2.2.2

The text was updated successfully, but these errors were encountered:

tikhonp · 2023-08-15T14:48:27Z

Hello! Thank you for bug report.

I ran sample code and didn't notice any issue. Can you provide your pcm.buff file?

One thing that maybe you misunderstood that lpcm data from Yandex speechkit comes raw (without any wave headers) so music playing software may not understand how play audio, so to play it i saved it differently. Instead of your open("pcm.buff", "wb").write(pcm_buff), i used python standard wave lib:

import wave

# ...

with wave.open('pcm.wav', 'wb') as f:
    f.setnchannels(1) # set number of audio channels. 1 means mono
    f.setsampwidth(2) # set number of bytes per sample 
    f.setframerate(48000) # set sample rate in hertz. 48000 is sampleRateHertz passed in synthesizeAudio.synthesize_stream call
    f.writeframes(pcm_buff) # and finally write the data to file

Without this metadata file may not play correctly.
Please tell if this will help, if not i will try to investigate it further.

alexey-v-paramonov · 2023-08-15T15:19:08Z

pcm.buff: https://disk.yandex.ru/d/ZeVoxVWJIcdSWw

Yes, I completely understand what is lpcm. On my screenshot - that is not actually a player, that's an audio editor. I import the data as "raw" format into that editor and the artifacts in the sound are seen clearly. So the noise is not coming from the player, it is there in the raw audio.

To confirm the problem and make sure it is not related to the audio format that I am using (raw or wav) I've also made a wav-version using your code above, here it is:
https://disk.yandex.ru/d/uA3Dxj2Lh0OksQ

the same issue persists the wav (timing: 2:08)

tikhonp · 2023-08-17T14:27:45Z

Now I understand your problem, it's how Yandex generates data, I cannot fix it :(

So, the solution may be to use Yandex grpc api v3, instead of rest api that I'm using in this lib, it has loudness_normalization_type parameter which might help solve the problem.

I tried to compile grpc, but I have known issue with cgrpc and python on MacBook with M2 arm silicon, so when i have more time i will try to solve it and add this functionality to this package.

If there is such an opportunity and you manage to solve this problem, make a pull request for this library, please

alexey-v-paramonov · 2023-08-17T16:32:35Z

Okay, thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scratching noise when converting text to speech. PCM data is definitely broken (screenshot attached). #10

Scratching noise when converting text to speech. PCM data is definitely broken (screenshot attached). #10

alexey-v-paramonov commented Aug 14, 2023

tikhonp commented Aug 15, 2023

alexey-v-paramonov commented Aug 15, 2023 •

edited

Loading

tikhonp commented Aug 17, 2023

alexey-v-paramonov commented Aug 17, 2023

Scratching noise when converting text to speech. PCM data is definitely broken (screenshot attached). #10

Scratching noise when converting text to speech. PCM data is definitely broken (screenshot attached). #10

Comments

alexey-v-paramonov commented Aug 14, 2023

tikhonp commented Aug 15, 2023

alexey-v-paramonov commented Aug 15, 2023 • edited Loading

tikhonp commented Aug 17, 2023

alexey-v-paramonov commented Aug 17, 2023

alexey-v-paramonov commented Aug 15, 2023 •

edited

Loading