Chapter 2. Portal technology 31
system and a categorizer. It is the use of the categorization system which
produces the high accuracy of Eureka!.
The Eureka! categorizer and associated data can be viewed as a black box that
accepts text in either HTML, XML, or flat text, and outputs a list of one or more
categories into which the text has been categorized, as well as a score for each.
Optionally, it may also detect the presence of one or more phrases or terms that
are then mapped to a specific category. The categorizer is available in multiple
languages; however, each language requires a separate invocation of the
categorizer. The categorizer requires that the calling application fetch the text to
be categorized and handle the resulting o ...