Skip to Main Content
Java Cookbook
book

Java Cookbook

by Ian F. Darwin
June 2001
Intermediate to advanced content levelIntermediate to advanced
888 pages
21h 1m
English
O'Reilly Media, Inc.
Content preview from Java Cookbook

Extracting HTML from a URL

Problem

You need to extract all the HTML tags from a URL.

Solution

Use this simple HTML tag extractor.

Discussion

A simple HTML extractor can be made by reading a character at a time and looking for < and > tags. This is reasonably efficient if a BufferedReader is used.

The ReadTag program shown in Example 17-5 implements this; given a URL, it opens the file (similar to TextBrowser in Section 17.7) and extracts the HTML tags. Each tag is printed to the standard output.

Example 17-5. ReadTag.java

/** A simple but reusable HTML tag extractor. */ public class ReadTag { /** The URL that this ReadTag object is reading */ protected URL myURL = null; /** The Reader for this object */ protected BufferedReader inrdr = null; /* Simple main showing one way of using the ReadTag class. */ public static void main(String[] args) throws MalformedURLException, IOException { if (args.length == 0) { System.err.println("Usage: ReadTag URL [...]"); return; } for (int i=0; i<args.length; i++) { ReadTag rt = new ReadTag(args[0]); String tag; while ((tag = rt.nextTag( )) != null) { System.out.println(tag); } rt.close( ); } } /** Construct a ReadTag given a URL String */ public ReadTag(String theURLString) throws IOException, MalformedURLException { this(new URL(theURLString)); } /** Construct a ReadTag given a URL */ public ReadTag(URL theURL) throws IOException { myURL = theURL; // Open the URL for reading inrdr = new BufferedReader(new InputStreamReader(myURL.openStream( ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Cloud-Native Java Development with MicroProfile

Practical Cloud-Native Java Development with MicroProfile

Emily Jiang, Andrew McCright, John Alcorn, David Chan, Alasdair Nottingham
Distributed Computing in Java 9

Distributed Computing in Java 9

Raja Malleswara Rao Malleswara Rao Pattamsetti

Publisher Resources

ISBN: 0596001703Supplemental ContentCatalog PageErrata