Chapter 10. Text-to-Speech (TTS)
An alternative (and complement) to human narration, and the associated costs of creating and distributing it, is speech synthesis—when done right, that is. The mere thought of synthesized speech is enough to make some people cringe, though, because it’s still typically equated with the likes of poor old much-maligned Microsoft Sam and his tinny, often-incomprehensible renderings. Modern high-end voices are getting harder and harder to distinguish as synthesized, however, and the voices on most reading systems and computers are getting progressively more natural sounding and pleasant to the ears for extended listening.
But whatever you think of the voices, the need to be able to synthesize the text of your ebook is always going to be vital to a segment of your readers, especially when human narration is not available. It’s also generally useful to the broader reading demographic, as you’ll see later in this chapter.
And the voice issues are a bit of a red herring. The real issue here is not how the voices sound but the mispronunciations the rendering engines make and the frequency with which they often make them. The constant mispronunciation of words disrupts comprehension and ruins reading enjoyment, because it breaks the narrative flow and leaves the reader to guess what the engine was actually trying to speak. It doesn’t have to be this way, though; the errors occur because the mechanisms to enhance default synthetic renderings haven’t been made ...
Get EPUB 3 Best Practices now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.