64
PART|I Signal Processing, Modelling and Related Mathematical Tools
written inputs and outputs are important means of communication
but also because speech-based interaction is often made via textual
transcriptions of spoken inputs and outputs. Whatever the information
supports, the interaction is based on the assumption that both partici-
pants do understand each other and produce meaningful responses.
Speech recognition and speech synthesis are, therefore, mandatory
but not sufficient to build a complete voice-enabled interface. Humans
and machines have to manipulate meanings (either for understanding
or producing speech and text) and to plan the interaction. In this con-
tribution, NLP will be considered under three points of view. First,
natural and spoken language understanding (NLU or SLU) aiming at
extracting meanings from text or spoken inputs. Second, natural lan-
guage generation (NLG) that involves the transcription of a sequence
of meanings into a written text. Finally, dialogue processing (DP) that
analyses and models the interaction from the system’s side and feeds
NLG systems according to extracted meanings from users’ inputs
by NLU.
4.2 NATURAL LANGUAGE UNDERSTANDING
It is the job of an NLU system to extract meanings of text inputs. In the
case of SLU, previous processing systems (such as automatic speech
recognition:ASR,see Chapter 3) are error-proneand can add semantic
noise such as hesitations, stop words, etc. This has to be taken into
account. To do so, some NLU systems are closely coupled to the
ASR system, using some of its internal results (Nbest lists, lattices or
confidence scores). Other NLU systems maintain several hypotheses
so as to propagate uncertainty until it can be disambiguated by the
context.
Assuming that the input is a correct word sequence, most of NLU
systems can be decomposed in three steps: syntactic parsing, seman-
tic parsing and contextual interpretation. In the following, a brief
overview of each step is given. For further details about the basic
ideas and methods of NLU, readers are invited to refer to [1].
4.2.1 Syntactic Parsing
Before trying to extract any meaning out of a sentence, the syntactic
structure of this sentence is generally analysed: the function of each
Chapter | 4 Natural Language and Dialogue Processing
65
word (part of speech), the way words are related to each other, how
they are grouped into phrases and how they can modify each other.
It helps resolving some ambiguities as homographs (homophones)
having different possible functions. For instance, the word ‘fly’ can
be a noun (the insect) or a verb and the word ‘flies’ can stand for
the plural form of the noun or an inflexion of the verb as shown on
Figure 4.1.
Most syntactic representations of language are based on the notion
of context-free grammars (CFG) [2]. Sentences are then split in a
hierarchical structure (Figure 4.1). Most early syntactic parsing algo-
rithms, aiming at creating this parse tree, were developed with the goal
of analysing programming languages rather than natural language
[3]. Two main techniques for describing grammars and implemen-
ting parsers are mainly used: context-free rewrite rules and transition
networks [4].
For instance, a grammar capturing the syntactic structure of the
first sentence in Figure 4.1 can be expressed by a set of rewrite rules
as follows:
1. S NP VP
2. NP AN
3. VP V
4. A The
5. N fly
6. V flies
S
i
: Sentence
A: Article
N: Noun
V: Verb
NP: Noun Phrase
VP: Verb Phrase
S
1
NP VP
The
A
fly
N
flies
The fly flies
V
S
2
NP VP
The
A
flies
N
fly
The flies fly
V
FIGURE 4.1 Syntactic parsing.

Get Multi-Modal Signal Processing now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.