HTML and screen scraping

Although more and more services are offering their data through APIs, when a service doesn't do this then the only way of getting the data programmatically is to download its web pages and then parse the HTML source code. This technique is called screen scraping.

Though it sounds simple enough in principle, screen scraping should be approached as a last resort. Unlike XML, where the syntax is strictly enforced and data structures are usually reasonably stable and sometimes even documented, the world of web page source code is a messy one. It is a fluid place, where the code can change unexpectedly and in a way that can completely break your script and force you to rework the parsing logic from scratch.

Still, it is sometimes ...

Get Learning Python Network Programming now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.