Scraping the Sun: Screen Scraping HTML Pages for Data

Although you won't see any fingernail marks on my computer monitor, I have been doing some screen scraping . This is a colloquial name for extracting data from HTML pages, named for the fact that HTML is designed for displaying on browser screens rather than processing by other programs. Ideally, all the data on the Web would also be available in XML, with excellent documentation, and published as a web service with complete metadata. Of course, that's not going to happen very soon, so our programs may occasionally have to read data directly from an HTML document. This is not as trivial ...

Get Wicked Cool Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.