In this article by Rakesh Vidya Chandra and Bala Subrahmanyam Varanasi, authors of the book Python Requests Essentials, we are going to deal with advanced topics in the Requests module. There are many more features in the Requests module that makes the interaction with the web a cakewalk. Let us get to know more about different ways to use Requests module which helps us to understand the ease of using it.

(For more resources related to this topic, see here.)

In a nutshell, we will cover the following topics:

Persisting parameters across requests using Session objects

Revealing the structure of request and response

Using prepared requests

Verifying SSL certificate with Requests

Body Content Workflow

Using generator for sending chunk encoded requests

Getting the request method arguments with event hooks

Iterating over streaming API

Self-describing the APIs with link headers

Transport Adapter

Persisting parameters across Requests using Session objects

The Requests module contains a session object, which has the capability to persist settings across the requests. Using this session object, we can persist cookies, we can create prepared requests, we can use the keep-alive feature and do many more things. The Session object contains all the methods of Requests API such as GET, POST, PUT, DELETE and so on. Before using all the capabilities of the Session object, let us get to know how to use sessions and persist cookies across requests.

Let us use the session method to get the resource.

>>> import requests
>>> session = requests.Session()
>>> response = requests.get("https://google.co.in", cookies={"new-cookie-identifier": "1234abcd"})

In the preceding example, we created a session object with requests and its get method is used to access a web resource.

The cookie value which we had set in the previous example will be accessible using response.request.headers.

>>> response.request.headers
CaseInsensitiveDict({'Cookie': 'new-cookie-identifier=1234abcd', 'Accept-Encoding': 'gzip, deflate, compress', 'Accept': '*/*', 'User-Agent': 'python-requests/2.2.1 CPython/2.7.5+ Linux/3.13.0-43-generic'})
>>> response.request.headers['Cookie']
'new-cookie-identifier=1234abcd'

With session object, we can specify some default values of the properties, which needs to be sent to the server using GET, POST, PUT and so on. We can achieve this by specifying the values to the properties like headers, auth and so on, on a Session object.

>>> session.params = {"key1": "value", "key2": "value2"}
>>> session.auth = ('username', 'password')
>>> session.headers.update({'foo': 'bar'})

In the preceding example, we have set some default values to the properties—params, auth, and headers using the session object. We can override them in the subsequent request, as shown in the following example, if we want to:

>>> session.get('http://mysite.com/new/url', headers={'foo': 'new-bar'})

Revealing the structure of request and response

A Requests object is the one which is created by the user when he/she tries to interact with a web resource. It will be sent as a prepared request to the server and does contain some parameters which are optional. Let us have an eagle eye view on the parameters:

Method: This is the HTTP method to be used to interact with the web service. For example: GET, POST, PUT.

URL: The web address to which the request needs to be sent.

headers: A dictionary of headers to be sent in the request.

files: This can be used while dealing with the multipart upload. It's the dictionary of files, with key as file name and value as file object.

data: This is the body to be attached to the request.json. There are two cases that come in to the picture here:
- If json is provided, content-type in the header is changed to application/json and at this point, json acts as a body to the request.
- In the second case, if both json and data are provided together, data is silently ignored.

params: A dictionary of URL parameters to append to the URL.

auth: This is used when we need to specify the authentication to the request. It's a tuple containing username and password.

cookies: A dictionary or a cookie jar of cookies which can be added to the request.

hooks: A dictionary of callback hooks.

A Response object contains the response of the server to a HTTP request. It is generated once Requests gets a response back from the server. It contains all of the information returned by the server and also stores the Request object we created originally.

Whenever we make a call to a server using the requests, two major transactions are taking place in this context which are listed as follows:

We are constructing a Request object which will be sent out to the server to request a resource

A Response object is generated by the requests module

Now, let us look at an example of getting a resource from Python's official site.

>>> response = requests.get('https://python.org')

In the preceding line of code, a requests object gets constructed and will be sent to 'https://python.org'. Thus obtained Requests object will be stored in the response.request variable. We can access the headers of the Request object which was sent off to the server in the following way:

>>> response.request.headers
CaseInsensitiveDict({'Accept-Encoding': 'gzip, deflate, compress', 'Accept': '*/*', 'User-Agent': 'python-requests/2.2.1 CPython/2.7.5+ Linux/3.13.0-43-generic'})

The headers returned by the server can be accessed with its 'headers' attribute as shown in the following example:

>>> response.headers
CaseInsensitiveDict({'content-length': '45950', 'via': '1.1 varnish', 'x-cache': 'HIT', 'accept-ranges': 'bytes', 'strict-transport-security': 'max-age=63072000; includeSubDomains', 'vary': 'Cookie', 'server': 'nginx', 'age': '557','content-type': 'text/html; charset=utf-8', 'public-key-pins': 'max-age=600; includeSubDomains; ..)

The response object contains different attributes like _content, status_code, headers, url, history, encoding, reason, cookies, elapsed, request.

>>> response.status_code
200
>>> response.url
u'https://www.python.org/'
>>> response.elapsed
datetime.timedelta(0, 1, 904954)
>>> response.reason
'OK'

Using prepared Requests

Every request we send to the server turns to be a PreparedRequest by default. The request attribute of the Response object which is received from an API call or a session call is actually the PreparedRequest that was used.

There might be cases in which we ought to send a request which would incur an extra step of adding a different parameter. Parameters can be cookies, files, auth, timeout and so on. We can handle this extra step efficiently by using the combination of sessions and prepared requests. Let us look at an example:

>>> from requests import Request, Session
>>> header = {}
>>> request = Request('get', 'some_url', headers=header)

We are trying to send a get request with a header in the previous example. Now, take an instance where we are planning to send the request with the same method, URL, and headers, but we want to add some more parameters to it. In this condition, we can use the session method to receive complete session level state to access the parameters of the initial sent request. This can be done by using the session object.

>>> from requests import Request, Session
>>> session = Session()
>>> request1 = Request('GET', 'some_url', headers=header)

Now, let us prepare a request using the session object to get the values of the session level state:

>>> prepare = session.prepare_request(request1)

We can send the request object request with more parameters now, as follows:

>>> response = session.send(prepare, stream=True, verify=True)
200

Voila! Huge time saving!

The prepare method prepares the complete request with the supplied parameters. In the previous example, the prepare_request method was used. There are also some other methods like prepare_auth, prepare_body, prepare_cookies, prepare_headers, prepare_hooks, prepare_method, prepare_url which are used to create individual properties.

Verifying an SSL certificate with Requests

Requests provides the facility to verify an SSL certificate for HTTPS requests. We can use the verify argument to check whether the host's SSL certificate is verified or not.

Let us consider a website which has got no SSL certificate. We shall send a GET request with the argument verify to it.

The syntax to send the request is as follows:

requests.get('no ssl certificate site', verify=True)

As the website doesn't have an SSL certificate, it will result an error similar to the following:

requests.exceptions.ConnectionError: ('Connection aborted.', error(111, 'Connection refused'))

Let us verify the SSL certificate for a website which is certified. Consider the following example:

>>> requests.get('https://python.org', verify=True)

<Response [200]>

In the preceding example, the result was 200, as the mentioned website is SSL certified one.

If we do not want to verify the SSL certificate with a request, then we can put the argument verify=False. By default, the value of verify will turn to True.

Body content workflow

Take an instance where a continuous stream of data is being downloaded when we make a request. In this situation, the client has to listen to the server continuously until it receives the complete data. Consider the case of accessing the content from the response first and the worry about the body next. In the above two situations, we can use the parameter stream. Let us look at an example:

>>> requests.get("https://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz", stream=True)

If we make a request with the parameter stream=True, the connection remains open and only the headers of the response will be downloaded. This gives us the capability to fetch the content whenever we need by specifying the conditions like the number of bytes of data.

The syntax is as follows:

if int(request.headers['content_length']) < TOO_LONG:
content = r.content

By setting the parameter stream=True and by accessing the response as a file-like object that is response.raw, if we use the method iter_content, we can iterate over response.data. This will avoid reading of larger responses at once.

The syntax is as follows:

iter_content(chunk_size=size in bytes, decode_unicode=False)

In the same way, we can iterate through the content using iter_lines method which will iterate over the response data one line at a time.

The syntax is as follows:

iter_lines(chunk_size = size in bytes, decode_unicode=None, delimitter=None)

The important thing that should be noted while using the stream parameter is it doesn't release the connection when it is set as True, unless all the data is consumed or response.close is executed.

Keep-alive facility

As the urllib3 supports the reuse of the same socket connection for multiple requests, we can send many requests with one socket and receive the responses using the keep-alive feature in the Requests library.

Within a session, it turns to be automatic. Every request made within a session automatically uses the appropriate connection by default. The connection that is being used will be released after all the data from the body is read.

Streaming uploads

A file-like object which is of massive size can be streamed and uploaded using the Requests library. All we need to do is to supply the contents of the stream as a value to the data attribute in the request call as shown in the following lines.

The syntax is as follows:

with open('massive-body', 'rb') as file:
   requests.post('http://example.com/some/stream/url',
                 data=file)

Using generator for sending chunk encoded Requests

Chunked transfer encoding is a mechanism for transferring data in an HTTP request. With this mechanism, the data is sent in a series of chunks. Requests supports chunked transfer encoding, for both outgoing and incoming requests. In order to send a chunk encoded request, we need to supply a generator for your body.

The usage is shown in the following example:

>>> def generator():
...     yield "Hello "
...     yield "World!"
...
>>> requests.post('http://example.com/some/chunked/url/path',
                 data=generator())

Getting the request method arguments with event hooks

We can alter the portions of the request process signal event handling using hooks. For example, there is hook named response which contains the response generated from a request. It is a dictionary which can be passed as a parameter to the request. The syntax is as follows:

hooks = {hook_name: callback_function, … }

The callback_function parameter may or may not return a value. When it returns a value, it is assumed that it is to replace the data that was passed in. If the callback function doesn't return any value, there won't be any effect on the data.

Here is an example of a callback function:

>>> def print_attributes(request, *args, **kwargs):
...     print(request.url)
...     print(request .status_code)
...     print(request .headers)

If there is an error in the execution of callback_function, you'll receive a warning message in the standard output.

Now let us print some of the attributes of the request, using the preceding callback_function:

>>> requests.get('https://www.python.org/',
                 hooks=dict(response=print_attributes))
https://www.python.org/
200
CaseInsensitiveDict({'content-type': 'text/html; ...})
<Response [200]>

Iterating over streaming API

Streaming API tends to keep the request open allowing us to collect the stream data in real time. While dealing with a continuous stream of data, to ensure that none of the messages being missed from it we can take the help of iter_lines() in Requests. The iter_lines() iterates over the response data line by line. This can be achieved by setting the parameter stream as True while sending the request.

It's better to keep in mind that it's not always safe to call the iter_lines() function as it may result in loss of received data.

Consider the following example taken from http://docs.python-requests.org/en/latest/user/advanced/#streaming-requests:

>>> import json
>>> import requests
>>> r = requests.get('http://httpbin.org/stream/4', stream=True)
>>> for line in r.iter_lines():
...     if line:
...         print(json.loads(line) )

In the preceding example, the response contains a stream of data. With the help of iter_lines(), we tried to print the data by iterating through every line.

Encodings

As specified in the HTTP protocol (RFC 7230), applications can request the server to return the HTTP responses in an encoded format. The process of encoding turns the response content into an understandable format which makes it easy to access it. When the HTTP header fails to return the type of encoding, Requests will try to assume the encoding with the help of chardet.

If we access the response headers of a request, it does contain the keys of content-type. Let us look at a response header's content-type:

>>> re = requests.get('http://google.com')
>>> re.headers['content-type']
 'text/html; charset=ISO-8859-1'

In the preceding example the content type contains 'text/html; charset=ISO-8859-1'. This happens when the Requests finds the charset value to be None and the 'content-type' value to be 'Text'.

It follows the protocol RFC 7230 to change the value of charset to ISO-8859-1 in this type of a situation. In case we are dealing with different types of encodings like 'utf-8', we can explicitly specify the encoding by setting the property to Response.encoding.

HTTP verbs

Requests support the usage of the full range of HTTP verbs which are defined in the following table. To most of the supported verbs, 'url' is the only argument that must be passed while using them.

Method	Description
GET	GET method requests a representation of the specified resource. Apart from retrieving the data, there will be no other effect of using this method. Definition is given as requests.get(url, kwargs)
POST	The POST verb is used for the creation of new resources. The submitted data will be handled by the server to a specified resource. Definition is given as requests.post(url, data=None, json=None, kwargs)
PUT	This method uploads a representation of the specified URI. If the URI is not pointing to any resource, the server can create a new object with the given data or it will modify the existing resource. Definition is given as requests.put(url, data=None, kwargs)
DELETE	This is pretty easy to understand. It is used to delete the specified resource. Definition is given as requests.delete(url, kwargs)
HEAD	This verb is useful for retrieving meta-information written in response headers without having to fetch the response body. Definition is given as requests.head(url, kwargs)
OPTIONS	OPTIONS is a HTTP method which returns the HTTP methods that the server supports for a specified URL. Definition is given as requests.options(url, kwargs)
PATCH	This method is used to apply partial modifications to a resource. Definition is given as requests.patch(url, data=None, kwargs)

Method

Description

GET

GET method requests a representation of the specified resource. Apart from retrieving the data, there will be no other effect of using this method.

Definition is given as requests.get(url, **kwargs)

POST

The POST verb is used for the creation of new resources. The submitted data will be handled by the server to a specified resource.

Definition is given as requests.post(url, data=None, json=None, **kwargs)

PUT

This method uploads a representation of the specified URI. If the URI is not pointing to any resource, the server can create a new object with the given data or it will modify the existing resource.

Definition is given as requests.put(url, data=None, **kwargs)

DELETE

This is pretty easy to understand. It is used to delete the specified resource.

Definition is given as requests.delete(url, **kwargs)

HEAD

This verb is useful for retrieving meta-information written in response headers without having to fetch the response body.

Definition is given as requests.head(url, **kwargs)

OPTIONS

OPTIONS is a HTTP method which returns the HTTP methods that the server supports for a specified URL.

Definition is given as requests.options(url, **kwargs)

PATCH

This method is used to apply partial modifications to a resource.

Definition is given as requests.patch(url, data=None, **kwargs)

Self-describing the APIs with link headers

Take a case of accessing a resource in which the information is accommodated in different pages. If we need to approach the next page of the resource, we can make use of the link headers. The link headers contain the meta data of the requested resource, that is the next page information in our case.

>>> url = "https://api.github.com/search/code?q=addClass+user:mozilla&page=1&per_page=4"
>>> response = requests.head(url=url)
>>> response.headers['link']
'<https://api.github.com/search/code?q=addClass+user%3Amozilla&page=2&per_page=4>; rel="next", <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=250&per_page=4>; rel="last"

In the preceding example, we have specified in the URL that we want to access page number one and it should contain four records. The Requests automatically parses the link headers and updates the information about the next page. When we try to access the link header, it showed the output with the values of the page and the number of records per page.

Transport Adapter

It is used to provide an interface for Requests sessions to connect with HTTP and HTTPS. This will help us to mimic the web service to fit our needs. With the help of Transport Adapters, we can configure the request according to the HTTP service we opt to use. Requests contains a Transport Adapter called HTTPAdapter included in it.

Consider the following example:

>>> session = requests.Session()
>>> adapter = requests.adapters.HTTPAdapter(max_retries=6)
>>> session.mount("http://google.co.in", adapter)

In this example, we created a request session in which every request we make retries only six times, when the connection fails.

Summary

In this article, we learnt about creating sessions and using the session with different criteria. We also looked deeply into HTTP verbs and using proxies. We learnt about streaming requests, dealing with SSL certificate verifications and streaming responses. We also got to know how to use prepared requests, link headers and chunk encoded requests.