August 2018
Intermediate to advanced
366 pages
10h 14m
English
To build the tree of ElementTree.Element objects representing the HTML document, we used two classes together: HTMLParser to read the HTML text, and TreeBuilder to build the tree of ElementTree.Element objects.
Every time HTMLParser faces an open or closed tag, it will call handle_starttag and handle_endtag. When we face those, we notify TreeBuilder that a new element must be started and then that the element must be closed.
Concurrently, we keep track of the last tag that was started (so the tag we're currently in) in self._stack. This way, we can know the currently opened tag that hasn't yet been closed. Every time we face a new open tag or a closed tag, we check whether the last open tag was a self-closing tag; if it was, ...