Information gathering of a website from whois.domaintools.com
Consider a situation where you want to glean all the hyperlinks from a web page. In this section, we will do this by programming. On the other hand, this can also be done manually by viewing the source of the web page. However, that will take some time.
So let's get acquainted with a very beautiful parser called lxml.
Let's see the code:
- The following modules will be used:
from lxml.html import fromstring import requests
- When you enter the desired website, the
request
module obtains the data of the website:
domain = raw_input("Enter the domain : ") url = 'http://whois.domaintools.com/'+domain user_agent='wswp' headers = {'User-Agent': user_agent} resp = requests.get(url, headers=headers) html = resp.text
- The following piece of code gets the table from the website data:
tree = fromstring(html) ip= tree.xpath('//*[@id="stats"]//table/tbody/tr//text()')
- The following
for
loop removes...