Python Requests

System design interview questions can be so open-ended, that it's too hard to know the right way to prepare. Now I am able to crack the design rounds of Amazon, Microsoft, and Adobe after buying this book. Daily revise one design question and I promise you can crack the design round.

In Python, requests is an inbuilt library used to make HTTP requests. It hides all the complexities of making requests behind a simple API so that we can focus on how to interact with the services and collect the data.

The requests library has so many useful features to offer. In this tutorial let’s see how to those features in a customized and optimized manner based on different situations. Also, we will learn how to efficiently use the requests and prevent the requests to external services, which slows down the application.

These are the keys things you will learn from this tutorial:

  • Make requests using the most common HTTP methods
  • Customize your requests’ headers and data, using the query string and message body
  • Inspect data from your requests and responses
  • Make authenticated requests
  • Configure your requests to help prevent your application from backing up or slowing down

If you have a very basic knowledge of HTTP, then this course is for you. If you don’t, don’t worry you can still follow along. Let’s get started.

Install requests:

Before we dive into the details, let’s install the requests library.

Type the below command in your terminal to install requests.

pip install requests

If you prefer pipenv, then use the below command:

pipenv install requests

Once installed, import in your application to use it.

import requests

Now we have got everything to jump into the tutorial, lets start by making GET requests.

The GET request:

GET and POST are the most popular HTTP methods. These methods determine what action we are going to perform while making a request. For instance, to get data from a webpage we use GET and to post data to a webpage, we use POST. We will be covering the other HTTP methods in the later part of this course.

To make a GET request, use

requests.get()

We can test this by making a GET request to GitHub’s root rest API :

import requests
print(requests.get('https://api.github.com'))
<Response [200]>

You have successfully made your first GET requests. Now we have to understand the response returned by the request.

Response:

Whenever we make an HTTP request, we get a response as a result. It contains information on whether the request is successful or not.  In order to understand the response object, we need to have a look at its attributes and behaviors.

To do so, make a GET request but store the result in a variable.

import requests
response=requests.get('https://api.github.com')

The return value of get() is an instance of Response, which we are storing in a variable called the response.  Now we can gather the result of the HTTP request using the response variable.

Status Code:

If you want to know whether your HTTP request is successful or not, the status code is the one to look at it. The status code contains information on the status of the request.

For instance status code 200 indicates that the request is successful and 404 indicates that the requested resource is not found. Take a look at the list of status codes to understand the result of your request. C

Check the status of the GET request we have made earlier. You can get the status code by

>>> response.status_code
200

The status code is 200, which means our GET request is successful and the response object contains the requested information.

You might face a situation where you have to make a decision based on the response.

if response.status_code==200:
    print('Request is Successful!')
elif response.status_code==404:
    print('Resource not found')

The above code will print the appropriate message to the console based on the status code of the response.

The requests library further simplifies the above code. If you evaluate the Response in any conditional expression,  it returns True if the status code is between 200 and 400 else returns False.

Let’s rewrite the above code using the conditional expression:

if response:
    print('Request is Successful!')
else:
    print('There is an error')

Here we are validating for the status code between 200 and 400. So the status codes 204 NO CONTENT( Request is successful but the message body does not have any content) and 304 NOT MODIFIED also considered successful.

Hence use this shortened code only to check if the request is successful in general. And then handle the request based on status code if needed.

You also raise an exception for unsuccessful request using .raise_for_status(). For a successful response, no exception will be raised.

import requests
from requests.exceptions import HTTPError

urls=[
    'https://api.github.com',
    'https://api.github.com/valid_api'
]
for url in urls:
    try:
        response=requests.get(url)
        response.raise_for_status()
    except HTTPError as http_error:
        print(f'There is an HTTP Error : {http_error}')
    except Exception as error:
        print(f'Some other error : {error}')
    else:
        print('Request is Successful!')

The .raise_for_status() method returns  a ‘Request is Successful!’ message for successful requests. For certain status codes, we might get an HTTPError.

Content:

Now we know how to know to understand the status of the request using status code. But there is a lot more to the response. You can actually see the data sent back by the server in the response body. Let’s see how to do that.

The response contains valuable information called payload in the body of the message. We can view the payload in different formats using the attributes and behavior of the response.

For instance, use .content to see the content of response in bytes.

response=requests.get('https://api.github.com')
print(response.content)
b'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}"}'

If you want to convert the raw bytes of the payload of the response into strings using character encoding like UTF-8, then we have the .text, which does the job for us.

>>> response.text
'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}"}'

The conversion from bytes to a string requires an encoding scheme. If we do not specify the encoding scheme then the requests will guess one based on the header of the response. You can set the .encoding explicitly prior to using .text.

>>> response.encoding='utf-8'
>>> response.text
'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}"}'

The content of the response looks like a JSON object. You can convert the string obtained from .text to JSON using json.loads() or .json().

>>> response.json()
{'current_user_url': 'https://api.github.com/user', 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}', 'authorizations_url': 'https://api.github.com/authorizations', 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}', 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}'}

Since the return value is a dictionary, we can access the values using its key. You can do more with status codes and the body of the message. Still, if you want more information like metadata of the response, then you need to look at the header of the response.

Response Header:

We can get a lot of useful information like the type of the response payload and the time limit to cache a response, from the response header.

>>> response.headers
{'date': 'Fri, 12 Jun 2020 09:03:08 GMT', 'content-type': 'application/json; charset=utf-8', 'server': 'GitHub.com', 'status': '200 OK', 'cache-control': 'public, max-age=60, s-maxage=60', 'vary': 'Accept, Accept-Encoding, Accept, X-Requested-With, Accept-Encoding', 'etag': 'W/"c6bac8870a7f94b08b440c3d5873c9ca"'}

The returned value is a dictionary-like object so that you can access the values using the key. For instance, to check the type of the response payload, you can get the value for the key ‘content-type’.

>>> response.headers['content-type']
'application/json; charset=utf-8'

The HTTP spec defines the headers to be case-insensitive. Hence we don’t need to worry about the case while accessing the response headers.

>>> response.headers['CONTENT-TYPe']
'application/json; charset=utf-8'

Irrespective of the case, the return value remains the same.

We have covered the useful attributes and behaviors of the response objects. Now let’s see how to customize the GET request and how the response is changed accordingly.

Query String:

Passing values to the parameters of the query string in a URL is one of the common ways to customize a GET request. We can pass the values to params in the get() method. Let’s see an example to search the request library in the GitHub repositories.

import requests

response = requests.get(
    'https://api.github.com/search/repositories',
    params={'q': 'requests+language:python'},
)

json_response=response.json()
respository_details=json_response['items'][0]
print(f'Name of the respository : {respository_details["name"]}')
print(f'Description of the repository : {respository_details["description"]}')

By changing the value passed to the params, you can modify the result of the response.

We have passed the data to the params in a dictionary type. You can also pass the values in

  • A list of tuples:
response = requests.get(
    'https://api.github.com/search/repositories',
    params=[('q','requests+language:python')],
)
  • Bytes:
response = requests.get(
    'https://api.github.com/search/repositories',
    params=b'q=requests+language:python',
)

Request Headers:

By adding or modifying the headers we can customize the HTTP request. In the get() method, pass the values to the headers parameter in a dictionary format. In the Accept header, by specifying the media-type as text-match we can highlight the matching the search term of the previous request.

import requests

response = requests.get(
    'https://api.github.com/search/repositories',
    params={'q': 'requests+language:python'},
    headers={'Accept': 'application/vnd.github.v3.text-match+json'},
)

json_response=response.json()
respository_details=json_response['items'][0]
print(f'Matching items : {respository_details["text_matches"]}')

The Accept header tells the server, type of the content our application handles. The value ‘application/vnd.github.v3.text-match+json’ is a proprietary GitHub Accept header and the content is a special JSON format.

Other HTTP methods:

As mentioned earlier, apart from GET, there are other popular HTTP methods: POST, PUT, PATCH, DELETE, HEAD and OPTIONS.

>>> requests.post('https://httpbin.org/post', data={'key':'value'})
>>> requests.put('https://httpbin.org/put', data={'key':'value'})
>>> requests.delete('https://httpbin.org/delete')
>>> requests.head('https://httpbin.org/get')
>>> requests.patch('https://httpbin.org/patch', data={'key':'value'})
>>> requests.options('https://httpbin.org/get')

We are making a request to the httpbin service using the above-mentioned HTTP methods. You can check the response for each of the methods similar to the way we did for GET method. The response for each of these methods contains headers, status code and more.

HEAD and DELETE:

The HEAD method requests the headers from the server.

>>> response = requests.head('https://httpbin.org/get')
>>> response.headers
{'Date': 'Sat, 13 Jun 2020 11:13:05 GMT', 'Content-Type': 'application/json', 'Content-Length': '308', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}

The DELETE method deletes the resource specified.

>>> response = requests.delete('https://httpbin.org/delete') 
>>> json_response=response.json() 
>>> json_response['args'] 
{}

POST, PUT, and PATCH:

The POST, PUT, and PATCH methods post the data to the server through the body of the message. We pass the payload in the data parameter of the function.

The POST method is used to create a new resource. The PUT and PATCH are used for full and partial update respectively.

We can pass the data in the dictionary, a list of tuples, bytes, or file-like object. The type of the data depends on the specific needs of the service we require. For instance, if the content type of the request is application/x-www-form-urlencoded, then you need to send the data in a dictionary format.

Use json parameter instead of data to send the json data. The request library will add the corresponding content-type for us while using the json parameter. You can verify this from the response we received for making the request.

import requests

response=requests.post('https://httpbin.org/post', json={'key':'value'})
json_response=response.json()
print(json_response['data'])
print(json_response['headers']['Content-Type'])
{"key": "value"}
application/json

PreparedRequest:

The requests library validates the header and serialized the JSON content before sending a request to the destination server. With .request we can see this PreparedRequest. The PreparedRequest gives you information on the request like payload, URL, headers, and more.

import requests

response=requests.post('https://httpbin.org/post', json={'key':'value'})
print(response.request.url)
print(response.request.headers)
print(response.request.body)
https://httpbin.org/post
{'User-Agent': 'python-requests/2.23.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '16', 'Content-Type': 'application/json'}
b'{"key": "value"}'

Authentication:

So far we have seen how to make unauthenticated requests to the public APIs. But may services require you to authenticate. We send our credentials through the Authorization header or server defined custom header. The requests function provides the auth parameter to pass the credentials.

Let’s take a look at the  GitHub’s Authenticated User API which provides information about the authenticated user’s profile. Pass the username and password in a tuple in get() to make a request to the Authenticated User API.

import requests
from getpass import getpass

print(requests.get('https://api.github.com/user', auth=('username', getpass())))
<Response [200]>

You need to pass the valid credentials so that the request is successful. Else you will get 401 unauthorized error.

>>> requests.get('https://api.github.com/user')
<Response [401]>

Whenever you pass the username and password requests will apply the credentials using HTTP’s Basic access authentication scheme. Hence you can modify the above request by explicitly passing the credentials using HTTPBasicAuth.

import requests
from getpass import getpass
from requests.auth import HTTPBasicAuth

print(requests.get('https://api.github.com/user', auth=HTTPBasicAuth('[email protected]', getpass())))
<Response [200]>

The requests library also provides other authentication methods such as  HTTPDigestAuth and HTTPProxyAuth.

You can also have your custom authentication mechanism by creating a subclass of AuthBase and implement the __call__() method.

import requests
from requests.auth import AuthBase

class CustomTokenAuth(AuthBase):

    def __init__(self,token):
        self.token=token

    def __call__(self, request):
        request.headers['X-CustomTokenAuth']=f'{self.token}'
        return request

print(requests.get('https://httpbin.org/get', auth=CustomTokenAuth('sample_token123')))
<Response [200]>

Here we are receiving a token and add it to the X-CustomTokenAuth header of the request. But there will be security vulnerabilities if the authentication mechanism is bad. So it’s always better to stick to tried-and-true auth schemes like Basic or OAuth.

SSL Certificate Verification:

Security is most important when we send and receive data from a webpage. The communication with secure sites over HTTP is established by an encrypted connection using SSL, by verifying the target server’s SSL certificate.

The requests library does this for us by default. You can override this behavior by passing the verify parameter as False. The request also gives a warning that you are making an insecure request.

>>> requests.get('https://api.github.com', verify=False)
C:\Users\Gopi\AppData\Local\Programs\Python\Python36-32\lib\site-packages\urllib3\connectionpool.py:858: InsecureRequest
Warning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
<Response [200]>

Performance:

We need to take care of the performance when using requests in a production environment. To keep the application running without any issues we need to take care of timeout control, sessions, and retry limits.

Timeout:

When we make a request to the external service, the system waits for the response. If it takes too long to get a response back, then it might cause a bad user experience or the backend application could hang.

The request library will wait indefinitely for a response by default. But you can prevent this by specifying a timeout duration with the help of the timeout parameter. The timeout accepts integer or float values representing the time in second to wait for a response before timing out.

>>> requests.get('https://api.github.com', timeout=3)
<Response [200]>
>>> requests.get('https://api.github.com', timeout=4.5)
<Response [200]>

The first request will timeout after 3 seconds and the second will timeout after 4.5 seconds.

The timeout also accepts tuple. The first value being the time allowed for the client to establish the connection and the second value represents the time to wait for a response after the connection has been established.

>>> requests.get('https://api.github.com', timeout=(3,4.5))
<Response [200]>

You will get a 200 Response if the connection is established within 3 seconds and the response is received within 4.5 seconds after establishing the connection. You can also raise Timeout exception of the request times out.

import requests
from requests.exceptions import Timeout

try:
    response = requests.get('https://api.github.com', timeout=0.01)

except Timeout:
    print('Request timed out')
else:
    print('Request is successful')
Request timed out

Session Object:

So far we have not worried about how connections are established when we make a high-level request APIs like get() and post(). These functions are just abstractions of what is going on when we make a request. There is a class called session underneath these abstractions. In order to improve the performance of the requests or to have control over your requests, then you need to directly use the instance of the session.

Across the requests, to persist parameters, sessions are used. For example, you can use a session where you need to use the same authorization credentials across multiple requests.

import requests
from getpass import getpass

with requests.Session() as session:
    session.auth = ('username', getpass())

    response = session.get('https://api.github.com/user')

print(response.headers)
print(response.json())

Once the session object has been initialized with authentication credentials, it will persist the credentials each time we make a request with a session.

Whenever a session is used to make a connection, it keeps that connection in the connection pool. When there is a need for the same connection again, instead of establishing a new one, it will reuse the one from the connection pool. Hence the performance is optimized with a persistent connection.

Max Retries:

You will get a ConnectionError when your request fails but you want your application to retry.  The requests library is not doing this for us. The solution is to implement a custom Transport Adaptor.

With Transport Adaptors you can define a set of configurations per service we are interacting with. For example, you want all your requests to https://api.github.com to try 5 times before raising ConnectionError. Build your own Transport Adaptor set its max_retries parameter to 5 times and mount it to the existing session.

import requests
from requests.adapters import HTTPAdapter
from requests.exceptions import ConnectionError

transport_adapter = HTTPAdapter(max_retries=5)

session = requests.Session()

session.mount('https://api.github.com', transport_adapter)

try:
    session.get('https://api.github.com')
except ConnectionError as error:
    print(error)

By mounting transport_adaptor to the session instance, the session instance will adhere to its configuration for each request made to https://api.github.com.

 

Crack System Design Interviews
Translate »