Extracting web data from a URL using JSoup

A large amount of data, nowadays, can be found on the Web. This data is sometimes structured, semi-structured, or even unstructured. Therefore, very different techniques are needed to extract them. There are many different ways to extract web data. One of the easiest and handy ways is to use an external Java library named JSoup. This recipe uses a certain number of methods offered in JSoup to extract web data.

Getting ready

In order to perform this recipe, we will require the following:

  1. Go to https://jsoup.org/download, and download the jsoup-1.9.2.jar file. Add the JAR file to your Eclipse project an external library.
  2. If you are a Maven fan, please follow the instructions on the download page to include ...

Get Java Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.