“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”
Web pages are a wonderful source of information, but it takes a human to understand what they mean. Wouldn’t it be great if all the information on the Web was available in a form that could be easily used by other programs? Think of the amazing applications that you could build.
In the early days of the Web, application developers tried to programmatically mine information from Web pages by screen scraping, a technique where HTML is parsed and its meaning is inferred based on assumptions about page layout, table headings, and other clues. Of course, screen scraping is a lost ...