Chapter | 14 Interactive Representations of Multimodal Databases
283
collections makes the management of the indexing structures com-
plex. Distributed and parallel processing is certainly a key-issue to
really attain the large-scale objective. Its role is to divide the compu-
tational effort of indexing and retrieval operations over a number of
CPUs and memory devices. Although feature extraction is somewhat
easily distributed over several CPUs with a coarse-grain strategy,
obtaining efficiently distributed indexing and learning procedures is
more challenging, especially in the context of multimodal retrieval.
The large-scale distributed indexing research field has been inves-
tigated for more than a decade. As a result, many distributed and
parallel structures allowing nearest-neighbour (NN) search in sub-
linear complexity have been proposed, and they are routinely used
nowadays in commercial applications [41]. Distributed inverted files
[42, 43], parallel VAfiles [44] and parallel metric trees [45] are highly
relevant approaches here.
14.3 MULTIMODAL DATA ACCESS
On the basis of aforementioned multimodal representations, search
and retrieval operations may be initiated. Many current information
management systems are centred on the notion of a query. This is true
over the Web (with all classical Web Search Engines), and for Digital
Libraries. In the domain of multimedia, available commercial appli-
cations propose rather simple management services whereas research
prototypes are also looking at responding to queries. The notion of
browsing may be used to extend query-based operations or may sim-
ply form an alternative access mode in several possible contexts that
we detail in the following sections.
14.3.1 Browsing as Extension of the Query
Formulation Mechanism
In the most general case, multimedia browsing is designed to supple-
ment search operations. This comes from the fact that the multimedia
querying systems largelydemonstrate their capabilities using the QBE
scenario, which hardly corresponds to a usable scenario. Multimedia
search systems are mostly based on content similarity. Hence, to fulfil
284
PART | III Multimodal Human-Computer and Human-to-Human Interaction
an information need, the user must express it with respect to relevant
(positive) and non-relevant (negative) examples [46]. From there,
some form of learning is performed, to retrieve the documents that
are the most similar to the combination of relevant examples and
dissimilar to the combination of non-relevant examples (see Section
14.4.1). The question then arises of how to find the initial exam-
ples themselves. Researchers have therefore investigated new tools
and protocols for the discovery of relevant bootstrapping examples.
These tools often take the form of browsing interfaces whose aim
is to help the user exploring the information space to locate the
sought items.
The initial query step of most QBE-based systems consists in
showing images in random sequential order over a 2D grid [46].
This follows the idea that a random sampling will be representative
of the collection content and allow for choosing relevant examples.
However, the chance for gathering sufficient relevant examples is low
and much effort must be spent in guiding the system towards the rel-
evant region of information space where the sought items may lie.
Similarity-based visualisation [47–55] organises images with respect
to their perceived similarities. Similarity is mapped onto the notion
of distance so that a dimension reduction technique may generate a
2D or 3D space representation where images may be organised. It is
further known that high dimensionality has an impact on the mean-
ingfulness of the distances defined [56]. This is known as the curse of
dimensionality (see Chapter 2) and several results can be proven that
there is a need for avoiding high-dimensional spaces, where possible.
A number of methods exist to achieve dimension reduction. We do
not detail the list and principles here but refer the reader to [57–60]
for thorough reviews on the topic.
Figure 14.1a illustrates the organisation of an image collection
based on colour information using the HDME dimension reduction
[58]. This type of display may be used to capture feedback by letting
the user reorganise or validate the displayed images. Figure 14.1b
shows a screenshot of the interface of the El Niño system [61] with
such a configuration.
Specific devices may then be used to perform search operations.
Figure 14.2 shows operators sitting around an interactive table for
Chapter | 14 Interactive Representations of Multimodal Databases
285
(a)
(b)
FIGURE 14.1 (a) Dimension reduction over a database of images and (b) interface
of the El Niño system [61].
(a) (b)
FIGURE 14.2 The PDH table and its artistic rendering (from [50]).
handling personal photo collections [50]. Figure 14.3 shows an oper-
ator manipulating images in front of a large multi-touch display
2
.
Alternative item organisations are also proposed such as the Osten-
sive Browsers (see Figure 14.4 and [62]) and interfaces associated to
the NN
k
paradigm [38].
All these interfaces have in common the fact of placing multi-
media retrieval much closer to human factors and therefore require
specific evaluation procedures, as detailed in Section 14.3.4.Although
somewhat different, it is worth mentioning here the development of
2. From http://www.perceptivepixel.com

Get Multi-Modal Signal Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.