Program: LinkChecker

One of the hard parts of maintaining a large web site is ensuring that all the hypertext links, images, applets, and so forth remain valid as the site grows and changes. It’s easy to make a change somewhere that breaks a link somewhere else, exposing your users to those “Doh!"-producing 404 errors. What’s needed is a program to automate checking the links. This turns out to be surprisingly complex due to the variety of link types. But we can certainly make a start.

Since we already created a program that reads a web page and extracts the URL-containing tags (Section 17.9), we can use that here. The basic approach of our new LinkChecker program is this: given a starting URL, create a GetURLs object for it. If that succeeds, read the list of URLs and go from there. This program has the additional functionality of displaying the structure of the site using simple indentation in a graphical window, as shown in Figure 17-3.

LinkChecker in action

Figure 17-3. LinkChecker in action

So using the GetURLS class from Section 17.9, the rest is largely a matter of elaboration. A lot of this code has to do with the GUI (see Chapter 13). The code uses recursion: the routine checkOut( ) calls itself each time a new page or directory is started.

Example 17-8 shows the code for the LinkChecker program.

Example 17-8.

/** A simple HTML Link Checker. * Need a Properties file to set depth, ...

Get Java Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.