Chapter 8. HTML in Swing

As anyone who has ever tried to write code to read HTML can tell you, it’s a painful experience. The problem is that although there is an HTML specification, no web designer or browser vendor actually follows it. And the specification itself is extremely loose. Element names may be uppercase, lowercase, or mixed case. Attribute values may or may not be quoted. If they are quoted, either single or double quotes may be used. The < sign may be escaped as < or it may just be left raw in the file. The <P> tag may be used to begin or end a paragraph. Closing </P>, </LI>, and </TD> tags may or may not be used. Tags may or may not overlap. There are just too many different ways of doing the same thing to make parsing HTML an easy task. In fact, the difficulties encountered in parsing real-world HTML were one of the prime motivators for inventing the much more strict XML, in which what is and is not allowed is precisely specified and all browsers are strictly prohibited from accepting documents that don’t measure up to the standard (as opposed to HTML, where most browsers try to fix up bad HTML, thereby leading to the proliferation of nonconformant HTML on the Web, which all browsers must then try to parse).

Fortunately, as of JFC 1.1.1 (included in Java 1.2.2), Sun provides classes for basic HTML parsing and display that shield Java programmers from most of the tribulations of working with raw HTML. The javax.swing.text.html.parser package can be used to read ...

Get Java Network Programming, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Java Network Programming, Second Edition by Elliotte Rusty Harold

Chapter 8. HTML in Swing

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly