May 2018
Beginner
230 pages
4h 49m
English
In this section, we will learn how to find the email addresses from a web page. In order to find the email addresses, we will use the regular expressions. The approach is very simple: first, get all the data from a given web page, then use email regular expression to obtain email addresses.
Let's see the code:
import urllibimport refrom bs4 import BeautifulSoupurl = raw_input("Enter the URL ")ht= urllib.urlopen(url)html_page = ht.read()email_pattern=re.compile(r'\b[\w.-]+?@\w+?\.\w+?\b')for match in re.findall(email_pattern,html_page ): print match
The preceding code is very simple. The html_page variable contains all the web page data. The r'\b[\w.-]+?@\w+?\.\w+?\b' regular expression represents the ...
Read now
Unlock full access