Example 1 – extracting HTML-based content

In this example, we will be using the HTML content from the regexHTML.html file and apply a Regex pattern to extract information such as the following:

  • HTML elements
  • The element's attributes (key and values)
  • The element's content

This example will provide you with a general overview of how we can deal with various elements, values, and so on that exist inside web content and how we can apply Regex to extract that content. The steps we will be applying in the following code will be helpful for processing HTML and similar content:

<html><head>   <title>Welcome to Web Scraping: Example</title>   <style type="text/css">        ....   </style></head><body>    <h1 style="color:orange;">Welcome to Web Scraping</h1> Links: ...

Get Hands-On Web Scraping with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.