-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scratching noise when converting text to speech. PCM data is definitely broken (screenshot attached). #10
Comments
Hello! Thank you for bug report. I ran sample code and didn't notice any issue. Can you provide your One thing that maybe you misunderstood that import wave
# ...
with wave.open('pcm.wav', 'wb') as f:
f.setnchannels(1) # set number of audio channels. 1 means mono
f.setsampwidth(2) # set number of bytes per sample
f.setframerate(48000) # set sample rate in hertz. 48000 is sampleRateHertz passed in synthesizeAudio.synthesize_stream call
f.writeframes(pcm_buff) # and finally write the data to file Without this metadata file may not play correctly. |
Yes, I completely understand what is To confirm the problem and make sure it is not related to the audio format that I am using (raw or wav) I've also made a wav-version using your code above, here it is: the same issue persists the wav (timing: 2:08) |
Now I understand your problem, it's how Yandex generates data, I cannot fix it :( So, the solution may be to use Yandex grpc api v3, instead of rest api that I'm using in this lib, it has I tried to compile grpc, but I have known issue with cgrpc and python on MacBook with M2 arm silicon, so when i have more time i will try to solve it and add this functionality to this package. If there is such an opportunity and you manage to solve this problem, make a pull request for this library, please |
Okay, thanks |
Describe the bug
Scratching noise in the PCM audio. If I import PCM into Audacity - I can see 2 peaks in PCM that produce distortion.
To Reproduce
The following code:
Expected behavior
Clean audio
Screenshots
Additional context
tts.log
Source text attached.
Python 3.10.12
speechkit==2.2.2
The text was updated successfully, but these errors were encountered: