February 2019
Beginner to intermediate
284 pages
6h 20m
English
Since HTML is a structure document, you can use a computer program to navigate that structure and extract selected text nodes, attributes, and elements. Most Web extraction tools are based on XPath: an XML standard that can be used to navigate in a XML structure and select elements, attributes, and text nodes using path notation. Although HTML is not as strict as XML, it has similar structures that can be represented as XPath paths and is supported by many Web scraping tools.
For example, the first lines of the previous web page have the following structure:
<html> <head> <title>Planetary Fact Sheet</title> </head> <body bgcolor=FFFFFF> <p> <hr> <H1>Planetary Fact Sheet - Metric</H1> <hr> <p> <table> ...
It's not ...
Read now
Unlock full access