Requesting and downloading dynamic website pages
In case of websites having forms or receiving user inputs, we have to submit a GET request or a POST request. Now let's try creating GET requests and post request with Python. The query string is the method for adding key-value pairs to a URL.
Escaping invalid characters
In the previous recipe, what will happen if we remove the try catch block in the last step?
patten = '(http)?s?:?(\/\/[^"]*\.(?:png|jpg|jpeg|gif|png|svg))'
for line in open('packtpub.txt'):
for m in re.findall(patten, line):
fileName = basename(urlsplit(m[1])[2])
img = urllib2.urlopen('https:' + m[1]).read()
file = open(fileName, "w")
file.write(img)
file.close()
break The script will fail after a few requests due to the error in the URL format. Some extra characters appeared in the URL and this failed the urllib request.
How to do it...
It's impossible to remember which characters are invalid...