Chapter 8. HTML in Swing
As anyone who has ever tried to write code to read HTML can tell you,
it’s a painful experience. The problem is that although there
is an HTML specification, no web designer or browser vendor actually
follows it. And the specification itself is extremely loose. Element
names may be uppercase, lowercase, or mixed case. Attribute values
may or may not be quoted. If they are quoted, either single or double
quotes may be used. The <
sign may be escaped
as <
or it may just be left raw in the
file. The <P>
tag may be used to begin or
end a paragraph. Closing </P>
,
</LI>
, and </TD>
tags may or may not be used. Tags may or may not overlap. There are
just too many different ways of doing the same thing to make parsing
HTML an easy task. In fact, the difficulties encountered in parsing
real-world HTML were one of the prime motivators for inventing the
much more strict XML, in which what is and is not allowed is
precisely specified and all browsers are strictly prohibited from
accepting documents that don’t measure up to the standard (as
opposed to HTML, where most browsers try to fix up bad HTML, thereby
leading to the proliferation of nonconformant HTML on the Web, which
all browsers must then try to parse).
Fortunately, as of JFC 1.1.1 (included in Java 1.2.2), Sun provides
classes for basic HTML parsing and display that shield Java
programmers from most of the tribulations of working with raw HTML.
The
javax.swing.text.html.parser
package can be used to read ...
Get Java Network Programming, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.