Reading HTML data

pandas uses the lxml Python module internally to read HTML data. You can install it from the command- line program by executing  conda install lxml, as shown in the following screenshot:

We can also import HTML data from local files, or even directly from the internet, as well:

Here, we pass in the location of the HTML file, or the URL, to the read_html method. read_html extracts the tabular data from HTML, and then converts it into a pandas DataFrame . In the following code, we have the data we extracted from the HTML file ...

Get Mastering Exploratory Analysis with pandas now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.