The Text-To-Speech Interface (TTSI) is used for the generation of synthetic speech from textual data (either text or phoneme). In this framework, the transmission of speech data is enabled at very low bit rates (200 bps to 1.2 kbit/s). Speech synthesis, in general, is useful in various kinds of multimedia applications, and thus MPEG-4 defines flexible means for its use in different situations. MPEG-4 TTSI allows the following additional information to the plain text:
Speaker-related information (speech rate, age, and gender of the speaker);
Prosody (e.g., time-dependent variation of pitch);
Language code, or lip shape information when used for video dubbing; and
Face animation-related parameters when used in synchronization ...