Program: Data Mining
Suppose that I, as a published author, want to track how my book is selling in comparison to others. This information can be obtained for free just by clicking on the page for my book on any of the major bookseller sites, reading the sales rank number off the screen, and typing the number into a file, but that’s tedious. As I somewhat haughtily wrote in the book that this example looks for, “computers get paid to extract relevant information from files; people should not have to do such mundane tasks.” This program uses the regular expressions API and, in particular, newline matching to extract a value from an HTML page. It also reads from a URL (discussed later in Section 17.7.) The pattern to look for is something like this (bear in mind that the HTML may change at any time, so I want to keep the pattern fairly general):
<b>QuickBookShop.web Sales Rank: </b> 26,252 </font><br>
As the pattern may extend over more than one line, I read
the entire web page from the URL into a single long string using my
FileIO.readerAsString( )
method (see Section 9.6) instead of the more traditional
line-at-a-time paradigm. I then plot a graph using an external
program (see Section 26.2); this could (and should)
be changed to use a Java graphics program. The complete program is
shown in Example 4-2.
Example 4-2. BookRank.java
import java.io.*; import com.darwinsys.util.FileIO; import java.net.*; import java.text.*; import java.util.*; import org.apache.regexp.*; /** Graph ...