Speech Synthesis

The term speech synthesis refers to the technologies that enable computers or other electronic systems to output simulated human speech. They provide acoustic information that is phonologically acceptable yet has meaning to human listeners. Speech synthesis has an even longer history than speech recognition but is still an evolving technology used for reading computer screens and providing verbal instruction, feedback, or assistance. Speech synthesis technology can be divided into two categories: concatenated synthesis and formant synthesis.

Concatenated Synthesis

Concatenated synthesis works best for systems requiring a small vocabulary.

Concatenated synthesis uses computer assembly of recorded voice sounds to create meaningful speech output. Because concatenated synthesis uses recorded human voice sounds, it tends to sound more natural than formant synthesis, which uses machine-generated speech. The basic process for developing concatenated synthesizers is to have a human reader read units of speech and store the recorded units of speech. These units are then assembled on demand according to given business rules. This is cost-prohibitive for many applications because of the necessary storage space, computational power required for assembly, and myriad of speech units required for natural sounding speech. Concatenated synthesis works best for systems requiring a small vocabulary.

One might assume that the requisite voice sounds typically recorded for assembly ...

Get Designing Effective Speech Interfaces now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.