May 2019
Beginner to intermediate
466 pages
10h 44m
English
Julia's Pkg ecosystem provides access to Gumbo, a HTML parser library. Provided with a HTML string, Gumbo will parse it into a document and its corresponding DOM. This package is an important tool for web scraping with Julia, so let's add it.
As usual, install using the following:
pkg> add Gumbo julia> using Gumbo
We're now ready to parse the HTML string into a DOM as follows:
julia> dom = parsehtml(resp_body) HTML Document
The dom variable now references a Gumbo.HTMLDocument, an in-memory Julia representation of the web page. It's a simple object that has only two fields:
julia> fieldnames(typeof(dom)) (:doctype, :root)
The doctype represents the HTML <!DOCTYPE html> element, which is what the Wikipedia page uses: ...
Read now
Unlock full access