Email address gathering from a web page
In this section, we will learn how to find the email addresses from a web page. In order to find the email addresses, we will use the regular expressions. The approach is very simple: first, get all the data from a given web page, then use email regular expression to obtain email addresses.
Let's see the code:
import urllib import re from bs4 import BeautifulSoup url = raw_input("Enter the URL ") ht= urllib.urlopen(url) html_page = ht.read() email_pattern=re.compile(r'\b[\w.-]+?@\w+?\.\w+?\b') for match in re.findall(email_pattern,html_page ): print match
The preceding code is very simple. The html_page
variable contains all the web page data. The r'\b[\w.-]+?@\w+?\.\w+?\b'
regular expression represents the email address.
Now let's see the output:

The preceding result is absolutely correct. The given URL web page was made by me for testing purposes.