Skip to Content
Mastering Python for Networking and Security
book

Mastering Python for Networking and Security

by José Manuel Ortega
September 2018
Intermediate to advanced
426 pages
10h 46m
English
Packt Publishing
Content preview from Mastering Python for Networking and Security

Getting links from a URL with urllib2

In this script, we can see how to extract links using urllib2 and HTMLParser. HTMLParser is a module that allows us to parse text files formatted in HTML.

You can get more information at https://docs.python.org/2/library/htmlparser.html.

You can find the following code in the get_links_from_url.py file:

#!/usr/bin/pythonimport urllib2from HTMLParser import HTMLParserclass myParser(HTMLParser):    def handle_starttag(self, tag, attrs):        if (tag == "a"):            for a in attrs:                if (a[0] == 'href'):                    link = a[1]                    if (link.find('http') >= 0):                        print(link)                        newParse = myParser()                        newParse.feed(link)web =  raw_input("Enter url: ")url = "http://"+webrequest = urllib2.Request(url)handle = urllib2.urlopen(request)parser = myParser() ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Mastering Python for Networking and Security - Second Edition

Mastering Python for Networking and Security - Second Edition

José Manuel Ortega
Python for Cybersecurity

Python for Cybersecurity

Howard E. Poston, III

Publisher Resources

ISBN: 9781788992510Supplemental Content