Example 1 – extracting HTML-based content

In this example, we will be using the HTML content from the regexHTML.html file and apply a Regex pattern to extract information such as the following:

  • HTML elements
  • The element's attributes (key and values)
  • The element's content

This example will provide you with a general overview of how we can deal with various elements, values, and so on that exist inside web content and how we can apply Regex to extract that content. The steps we will be applying in the following code will be helpful for processing HTML and similar content:

<html><head>   <title>Welcome to Web Scraping: Example</title>   <style type="text/css">        ....   </style></head><body>    <h1 style="color:orange;">Welcome to Web Scraping</h1> Links: ...

Get Hands-On Web Scraping with Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.