forked from RHVoice/RHVoice
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathNEWS
84 lines (70 loc) · 4.16 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
New in version 0.4-a2
A new Russian female voice has been included. It is trained on a
larger speech database, recorded by Irina Vorobyova. As usual, it
seems to have speech quality problems because of pitch extraction
failures, which is especially noticable in utterance final syllables.
I tried to model interrogative intonation explicitly by including many
questions in the recording script, but the attempt seems to be only
partially successful.
Several new configuration settings have been added, including
pseudo-english mode for Russian voices.
I now use the following two sources of information about stress
placement in Russian: a stress dictionary generated from RuLex, which
is a Russian pronunciation dictionary by Igor Poretsky, and a stress
dictionary extracted from Russian Wiktionary.
This version of RHVoice introduces a native implementation of an
output module for Speech Dispatcher. Though some features, such as reading of
key presses, have not been implemented yet.
New in version 0.4-a1
This is actually not an update, but a completely new synthesizer.
Being not the best of programmers in general and especially bad at
programming in pure C, I found it more and more difficult to maintain
that old code. Some new features were just impossible to implement
using the existing infrastructure. The most important of those changes
I couldn't make was multilingual support. So I rewrote the whole thing
from scratch in C++.
This version is not ready for production use. Some features are
incomplete, some are still missing, and there must be serious bugs I
haven't discovered yet.
At the moment RHVoice can speak in three languages: Russian, English
and Esperanto. There is only one Esperanto voice, and I expect it to
be the most problematic in terms of speech quality, because the
speaker often used creaky voice during the recording.
RHVoice now supports automatic language switching, if the languages
have no common letters. So at the moment there are two language pairs:
Russian+English and Russian+Esperanto. Whichever interface is used,
the client must set the main voice, which determines the primary
language. An optional extra voice will determine, if the language
switching should be enabled. If it's turned on and the synthesizer
cannot detect the language of the sentence, the primary language will
be used.
The concept of variants has been dropped for now, because it doesn't
really fit a multilingual text-to-speech system, and I am not sure
what I should do about it.
The previous version was painful to use from command line, because it
took a long time to load the voice in memory. The new one is even
worse in this respect, because language data is also stored in
external binary files and it takes time to load them. That's why i
haven't even written a proper command-line interface for the new
RHVoice yet. But I've implemented a D-Bus service and a command-line
client for it.
It is now possible to set language-specific and voice-specific prosody
options in the configuration file, e.g. you can use faster rate with
your native language. The sonic library is now the only supported
method of speeding up or slowin down speech. Support for applying
filters to speech is temporarily dropped, but I plan to return it in
0.5. Punctuation levels and capitalization indication haven't veen
implemented yet either, but they should be implemented before the
final release of 0.4.
User dictionaries will most likely come back only in 0.5, although
the format will be different (simpler, no regular expressions).
The worst news is probably that the C API is not backward compatible.
Since there will be at least three clients/interfaces developed by me
personally, I fixed what I think was wrong with the old API:
* There are no global initialization/uninitialization functions
anymore, clients must create their own private instances of the tts engine.
* There won't be any global settings, even for the particular instance
of the engine, because synthesis requests are usually processed in a
separate thread, and global settings don't make sense in this case.
* There are now separate callbacks for different types of events, and
most of them are optional.