
✐
✐
“4137X˙CH02˙Akerkar” — 2007/9/20 — 10:12 — page 31 — #13
✐
✐
✐
✐
✐
✐
2.2 Document Representation 31
Readers should see the effects of running the Stemmer on Tokenized-d1.txt from the subdi-
rectory fig2.3 by typing the following command:
java cp ../java Stemmer Tokenized-d1.txt
(Command 2.2)
The results of redirecting the output from (Command 2.2) appear in the file Stemmedd1.txt
in the subdirectory fig2.3.
2.2.2 Term-Document Matrix
Term-document matrix (TDM) is a two-dimensional representation of a document collection.
The rows of the matrix represent various documents, and the columns correspond to various
index terms. The values in the matrix can be either ...