O'Reilly logo

MPEG-4 Book, The by Touradj Ebrahimi, Fernando Pereira

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

12.3. Text-to-Speech Interface

The Text-To-Speech Interface (TTSI) is used for the generation of synthetic speech from textual data (either text or phoneme). In this framework, the transmission of speech data is enabled at very low bit rates (200 bps to 1.2 kbit/s). Speech synthesis, in general, is useful in various kinds of multimedia applications, and thus MPEG-4 defines flexible means for its use in different situations. MPEG-4 TTSI allows the following additional information to the plain text:

  • Speaker-related information (speech rate, age, and gender of the speaker);

  • Prosody (e.g., time-dependent variation of pitch);

  • Language code, or lip shape information when used for video dubbing; and

  • Face animation-related parameters when used in synchronization ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required