Chapter 8. HTML in Swing
As anyone who has ever tried to write code to read
HTML can tell you, it’s a painful experience. The problem is
that although there is an HTML specification, no web designer or browser
vendor actually follows it. And the specification itself is extremely
loose. Element names may be uppercase, lowercase, or mixed case. Attribute
values may or may not be quoted. If they are quoted, either single or
double quotes may be used. The <
sign may be escaped as <
or it
may just be left raw in the file. The <P>
tag may be used to begin or end a
paragraph. Closing </P>
, </LI>
, and </TD>
tags may or may not be used. Tags
may or may not overlap. There are just too many different ways of doing
the same thing to make parsing HTML an easy task. In fact, the
difficulties encountered in parsing real-world HTML were one of the prime
motivators for the invention of the much stricter XML, in which what is
and is not allowed is precisely specified and all browsers are strictly
prohibited from accepting documents that don’t measure up to the standard
(as opposed to HTML, where most browsers try to fix up bad HTML, thereby
leading to the proliferation of nonconformant HTML on the Web, which all
browsers must then try to parse).
Fortunately, as of JFC 1.1.1 (included in Java 1.2.2 and
later), Sun provides classes for basic HTML parsing and display that
shield Java programmers from most of the tribulations of working with raw
HTML. The javax.swing.text.html.parser
package can read ...
Get Java Network Programming, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.