In this case, we will be processing XML content with the PyQuery urlXML object, which uses parser='xml':
#creating PyQuery object using parser 'xml'urlXML = pq(xmlFile, parser='xml')print("Children Length: ",urlXML.children().__len__())
The preceding code returns the length of the children's count, that is, 137 total URLs:
Children Length: 137
As shown in the following code, the first and inner children elements return the required URL content we are willing to extract:
print("First Children: ", urlXML.children().eq(0))print("Inner Child/First Children: ", urlXML.children().children().eq(0))First Children: <url xmlns="https://www.sitemaps.org/schemas/sitemap/0.9"><loc>https://webscraping.com</loc></url>Inner ...