
✐
✐
“4137X˙CH02˙Akerkar” — 2007/9/20 — 10:12 — page 20 — #2
✐
✐
✐
✐
✐
✐
20 CHAPTER 2 Information Retrieval
2.2 Document Representation
Documents on the Web consist of a variety of different formats, and the information may consist
of text, graphics, audio, and video. This chapter deals with traditional text-based information
retrieval. Multimedia information retrieval is still in its infancy (some initial attempts are
discussed in Chapter 7). The IR process described here will be based on text extracted from
different types of documents. We will look at some of the tools that make it possible for us
to retrieve text from various document formats. Traditional ...