O'Reilly logo

Python: Penetration Testing for Developers by Dave Mound, Benjamin May, Andrew Mabbitt, Terry Ip, Cameron Buchanan, Mohit, Christopher Duffy

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Information gathering of a website from SmartWhois by the parser BeautifulSoup

Consider a situation where you want to glean all the hyperlinks from the webpage. In this section, we will do this by programming. On the other hand, this can also be done manually by viewing the view source of the web page. However this will take some time.

So let's get acquainted with a very beautiful parser called BeautifulSoup. This parser is from a third-party source and is very easy to work with. In our code, we will use version 4 of BeautifulSoup.

The requirement is the title of the HTML page and hyperlinks.

The code is as follows:

import urllib from bs4 import BeautifulSoup url = raw_input("Enter the URL ") ht= urllib.urlopen(url) html_page = ht.read() b_object ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required