Case 2 – using the XML parser

In this case, we will be processing XML content with the PyQuery urlXML object, which uses parser='xml':

#creating PyQuery object using parser 'xml'urlXML = pq(xmlFile, parser='xml')print("Children Length: ",urlXML.children().__len__())

The preceding code returns the length of the children's count, that is, 137 total URLs:

Children Length: 137

As shown in the following code, the first and inner children elements return the required URL content we are willing to extract:

print("First Children: ", urlXML.children().eq(0))print("Inner Child/First Children: ", urlXML.children().children().eq(0))First Children: <url xmlns="https://www.sitemaps.org/schemas/sitemap/0.9"><loc>https://webscraping.com</loc></url>Inner ...

Get Hands-On Web Scraping with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.