SSML consists of XML-like tags, for example: Did you mean the <emphasis level="strong"><prosody pitch="75">green</prosody></emphasis> beans?
The following markup tags and attributes are recognised:
- xml:base (the value is just passed back as a parameter with the UriCallback() function)
- xml:lang
- xml:lang
- name
- age
- variant
- gender
- rate (
x-slow
,slow
,medium
,fast
,x-fast
or a percentage such as125%
) - volume (
silent
,x-soft
,soft
,medium
,loud
,x-loud
,+1dB
or-1dB
) - pitch (a number, for example "75")
- range (
default
,x-low
,low
,medium
,high
,x-high
)
- interpret-as="characters"
- interpret-as="characters" format="glyphs"
- interpret-as="tts:key"
- interpret-as="tts:char"
- interpret-as="tts:digits"
- name
- xml:lang
- xml:lang
- alias
- field="punctuation" mode=none,all,some
- field="capital_letters" mode=no,spelling,icon,pitch
- src
- level (
none
,reduced
,moderate
,strong
orx-strong
)
- strength
- time
eSpeak can speak HTML text directly, or text containing both SSML and HTML markup.
Any unrecognised tags are ignored.
The following tags cause a sentence break:
br
dd
li
img
td
The following tags cause a paragraph break:
h1
h2
h3
h4
hr
Text between the following tags is ignored:
script
style
- Speech Synthesis Markup Language (SSML) Version 1.0. W3C Recommendation, 3 March 2009. W3C.
- Speech Synthesis Markup Language (SSML) Version 1.1. W3C Recommendation, 7 September 2010. W3C.
- SSML 1.0 say-as attribute values. W3C NOTE, 26 May 2005. W3C.
- HTML 5.2. W3C Recommendation, 14 December 2017. W3C.
- HTML Living Standard. Continually updated. WHATWG.