Getting started with Speech Synthesis Markup Language (SSML)
The World Wide Web Consortium (W3C) created Speech Synthesis Markup Language (SSML) as an XML-based markup language to assist in generating natural-sounding synthesized speech. The Plivo Speak XML element supports the generation of SSML-based speech, powered by Amazon Polly. It supports 27 languages and more than 40 voices, and allows developers to control pronunciation, pitch, and volume.
Here‘s how SSML appears within Plivo Speak XML elements:
To synthesize SSML speech on Plivo, specify one of the Amazon Polly voices in the voice
attribute of Plivo’s <Speak> XML tag. Note that Polly voices must be namespaced with a Polly
prefix.
For example:
SSML tags
You can use these SSML tags within Plivo XML.
SSML Tag | Action | Description |
---|---|---|
<break> | Add a pause | Use this tag to include a pause in the speech. |
<emphasis> | Emphasize words | Use this tag to change the rate and voice of the speech. |
<lang> | Specify another language for specific words | Use this tag to set the natural language of the text. |
<p> | Add a pause between paragraphs | Use this tag to represent a paragraph. |
<phoneme> | Use phonetic pronunciation | Use this tag to set phonetic pronunciation for specific text. |
<prosody> | Control volume, speaking rate, and pitch | Use this tag to modify the volume, speaking rate, and pitch of the tagged text. |
<s> | Add a pause between sentences | Use this tag to represent a sentence. This adds a strong break before and after the tag. |
<say-as> | Control how special types of words are spoken | Use this tag to describe how to interpret the text. |
<sub> | Pronounce acronyms and abbreviations | Use this tag to pronounce the specified words or phrases as different words or phrases. |
<w> | Improve pronunciation by specifying parts of speech | Use this tag to customize the pronunciation of words by specifying the part of speech they are. |
Note: Plivo doesn’t support these Amazon Polly-specific tags in Plivo XML:
- <amazon:auto-breaths>
- <amazon:effect name=“drc”>
- <amazon:effect phonation=“soft”>
- <amazon:effect vocal-tract-length>
- <amazon: effect name=“whispered”>
SSML voices
Plivo supports these Amazon Polly voices for use with Plivo XML:
Character limit
To ensure quick synthesis, Plivo caps the length of text that can be synthesized in one <Speak> tag at 3,000 characters.
Pricing
Support for SSML-based speech synthesis is currently in beta and free for all Plivo users. We expect to eventually charge for text-to-speech on the basis of the number of characters synthesized.
SSML support in Plivo Server SDKs
SSML tags are supported in all of our Server SDKs.
Example
This example use the Joey voice for US English (en-US). Use the <Speak voice> tag to specify the voice for your text.
say-as
The say-as tag describes how to interpret the text.
The rendered XML document would be:
w
The w tag lets you customize the pronunciation of a word by specifying its part of speech.
The rendered XML document would be: