July 2017
Beginner to intermediate
715 pages
17h 3m
English
Web scraping is the process of extracting information from a web page. The page is typically formatted using a series of HTML tags. An HTML parser is used to navigate through a page or series of pages and to access the page's data or metadata.
Jsoup (https://jsoup.org/) is an open source Java library that facilitates extracting and manipulating HTML documents using an HTML parser. It is used for a number of purposes, including web scraping, extracting specific elements from an HTML page, and cleaning up HTML documents.
There are several ways of obtaining an HTML document that may be useful. The HTML document can be extracted from a:
The first approach is illustrated next where the Wikipedia page for ...
Read now
Unlock full access