Parsing e-mail address and URLs from text
Parsing elements such as e-mail addresses and URLs is a common task. Regular expressions make finding these patterns easy.
How to do it...
The regular expression pattern to match an e-mail address is as follows:
[A-Za-z0-9._]+@[A-Za-z0-9.]+\.[a-zA-Z]{2,4}
Consider the following example:
$ cat url_email.txt this is a line of text contains,<email> #[email protected]. </email> and email address, blog "http://www.google.com", [email protected] dfdfdfdddfdf;[email protected]<br /> <a href="http://code.google.com"><h1>Heading</h1>
As we are using extended regular expressions (+
, for instance), we should use egrep
:
$ egrep -o '[A-Za-z0-9._]+@[A-Za-z0-9.]+\.[a-zA-Z]{2,4}' url_email.txt [email protected] [email protected] [email protected]
The egrep
regex pattern for an HTTP URL is as follows:
http://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,4}
Consider this example:
$ egrep -o "http://[a-zA...