Requesting and downloading dynamic website pages
In case of websites having forms or receiving user inputs, we have to submit a GET
request or a POST
request. Now let's try creating GET
requests and post request with Python. The query string is the method for adding key-value pairs to a URL.
Escaping invalid characters
In the previous recipe, what will happen if we remove the try catch block in the last step?
patten = '(http)?s?:?(\/\/[^"]*\.(?:png|jpg|jpeg|gif|png|svg))' for line in open('packtpub.txt'): for m in re.findall(patten, line): fileName = basename(urlsplit(m[1])[2]) img = urllib2.urlopen('https:' + m[1]).read() file = open(fileName, "w") file.write(img) file.close() break
The script will fail after a few requests due to the error in the URL format. Some extra characters appeared in the URL and this failed the urllib
request.
How to do it...
It's impossible to remember which characters are invalid...