JSON with Python

System design interview questions can be so open-ended, that it's too hard to know the right way to prepare. Now I am able to crack the design rounds of Amazon, Microsoft, and Adobe after buying this book. Daily revise one design question and I promise you can crack the design round.

What is JSON?

JSON stands for JavaScript Object Notation. It is based upon JavaScript syntax. Syntactically, the JSON format is similar to the code for creating JavaScript objects. Even though the JSON is based on Javascript, JSON is distinct from Javascript and some JavaScript is not JSON. Here in this article, we will learn about how to use JSON with Python.

JSON is a completely language-independent text format, at the same time uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. Because of this JSON is considered an ideal data-interchange language.

To serialize and transmit structured data over a network connection, JSON is often used. Even though XML is used for the same purpose, we use JSON primarily for transmitting data between a server and web application. JSON is preferred over XML because it’s lightweight. You can learn more information about JSON on the official JSON web site.

JSON Object:

The JSON object looks similar to JavaScript objects and Python dictionaries. Like JavaScript objects and Python dictionaries JSON object is also an unordered set of name and value pair surrounded by curly braces. The name-value pair is represented by a name in double-quotes followed by a colon and followed by a value.

{
"Name":"John"
}

An object can contain multiple name-value pairs and name-value pairs as well. The name-value is separated by a comma.

{
  "id": "0001",
  "type": "donut",
  "name": "Cake",
  "ppu": 0.55,
  "batters":
    {
      "batter":
        [
          { "id": "1001", "type": "Regular" },
          { "id": "1002", "type": "Chocolate" },
          { "id": "1003", "type": "Blueberry" },
          { "id": "1004", "type": "Devil's Food" }
        ]
    },
  "topping":
    [
      { "id": "5001", "type": "None" },
      { "id": "5002", "type": "Glazed" },
      { "id": "5005", "type": "Sugar" },
      { "id": "5007", "type": "Powdered Sugar" },
      { "id": "5006", "type": "Chocolate with Sprinkles" },
      { "id": "5003", "type": "Chocolate" },
      { "id": "5004", "type": "Maple" }
    ]
}

JSON also supports data types like numbers, strings, lists, and objects.

JSON with Python:

Let’s start understanding how to JSON data in Python with some code examples. Python comes with an in-built json package that we can use to work with JSON data. We need to import json in our Python script to make use of this package.

import json

Serialization and Deserialization:

The process of converting JSON data to a series of bytes to be stored into memory or shared across a network is called Encoding. Serialization is the process of encoding JSON data.

The reverse process of serialization is Deserialization. The process of decoding the data is called Deserialization.

In simple terms, we can say serialization and Deserialization mean writing and reading the data into memory. Encoding is for writing the data whereas decoding is for reading the data.

Serializing JSON data:

Python’s inbuilt json package provides dump() and dumps() method for converting Python objects into JSON. Let’s see the difference between these two methods:

dump() dumps()
dump() converts Python objects to JSON objects and writes it to file. To convert Python object to a JSON string dumps() is used
The output file where data is to  be stored has to be passed as an argument The filename is not required
Faster than dumps() Slower than dump()

The below table explains how Python objects are converted to equivalent JSON objects.

Python object Equivalent JSON object
Dict Object
List Array
Tuple Array
str String
int Number
float Number
True true
False false
None null

dump() example:

Let’s see an example to convert the python object to a JSON object and save it to a file using the dump() method. This is our python data

python_data=
    {
      "batter":
        [
          { "id": "1001", "type": "Regular" },
          { "id": "1002", "type": "Chocolate" },
          { "id": "1003", "type": "Blueberry" },
          { "id": "1004", "type": "Devil's Food" }
        ]
    }

And the Python code is:

with open('Data_file.json','w') as filename:
    json.dump(python_data,filename)

dump() takes two arguments:

  • python_data
  • filename-Data_file.json, which is the output file where the JSON object is to be stored.

dumps() example:

dumps() method converts Python object to the JSON string.

json_string=json.dumps(python_data)

Here we are not passing the other argument filename as we did in dump(), which is not required.

Keyword arguments for dump() and dumps():

indent:

Though JSON is easy to read, it becomes even easier to read when formatted properly. You can use an additional keyword argument called indent to change the indentation for nested structures. Execute the below code and notice the difference in the format with the usage of indent.

print(json.dumps(python_data))
print(json.dumps(python_data,indent=4))

separator:

There is another keyword argument that we can use to change formatting, is separators. The separator is an (item_separator, key_separator) tuple. The default value is (‘, ‘,’: ‘). To get the most compact JSON use (‘,’,’:’) which eliminates the whitespace.

print(json.dumps(python_data))
print(json.dumps(python_data,separators=(',',':')))

Refer to the docs for an additional list of keyword arguments.

Deserializing JSON Data:

Deserialization converts the JSON objects to their respective Python objects. We can use load() and loads() for deserializing.

load() gets its data from file whereas loads() gets its data from a string object.

The below tables explains how the JSON objects are converted to their respective Python objects:

JSON object Python object
object dict
array list
string str
null None
number(int) int
number(real) float
true True
false False

This table is not the exact opposite of the table we have seen in serialization. That’s because when we encode an object, we may not get the same object back after decoding.

Check the below example, we are encoding a tuple. The tuple equivalent of a JSON object is an array. Decoding the array gives a list. So encoding and decoding of tuple results in a list.

import json

input_data=('a','b',1,2,3)
encoded_data=json.dumps(input_data)
decoded_data=json.loads(encoded_data)
print(input_data==decoded_data)
print(type(input_data))
print(type(decoded_data))
print(input_data==tuple(decoded_data))

load() example:

load() method converts JSON object to a Python object and gets it’s input from a file. Remember we have created a Data_file.json while serializing the data. Let’s use the same file here.

import json

with open('Data_file.json','r') as filename:
    data=json.load(filename)

print(data)

In case you don’t the type of the data you are loading, then remember that the type of the resultant data can be anything present in the conversion table.

loads() example:

loads() gets its input from a string instead of a file.

import json

json_data="""
{
      "batter":
        [
          { "id": "1001", "type": "Regular" },
          { "id": "1002", "type": "Chocolate" },
          { "id": "1003", "type": "Blueberry" },
          { "id": "1004", "type": "Devil's Food" }
        ]
    }
"""

print(json.loads(json_data))

Real-time example:

To practice, let’s use JSONPlaceholder, which provides sample JSON data. We can get the data from the JSONPlaceholder service by making an API request to it. Python provides an inbuilt package called ‘requests’ that we can use to make API calls.

Create a Python file with the name of your choice and wherever you want and then import the below packages.

import json
import requests

We need to make an API request to the JSONPlaceholder service for the /todos endpoint. When the API call is successful, we will be getting 200 as a status code.

response = requests.get("https://jsonplaceholder.typicode.com/todos")
print(response.status_code)

To deserialize the text attribute of the response object we can use either json.loads or json() method. As we have seen above the loads is a json function, that can be used not only to parse string data, but also to be used with requests context.

response = requests.get("https://jsonplaceholder.typicode.com/todos")
print(json.loads(response.text))

.json is a method of the class requests.models.Response. It returns the json data from a response of the request.

response = requests.get("https://jsonplaceholder.typicode.com/todos")
print(response.json())

The output of json.loads and .json is the same though.

response = requests.get("https://jsonplaceholder.typicode.com/todos")
print(response.json()==json.loads(response.text))

If you find it difficult to jump between your Python file and terminal every time you change the code, then it’s convenient to run your script in the interactive mode. You can run in interactive mode by using -i when we run the script. The advantage is not only it runs the script but also you can access all the data from the script in the terminal itself.

PS C:\Users\Gopi\Desktop\backup\myPython\freelancer> python -i code1.py
>>> json.loads(response.text)==response.json()
True
>>> response.status_code
200

The response.json() method returns a list. You can do all the list operations with the returned item.

response = requests.get("https://jsonplaceholder.typicode.com/todos")
todos=response.json()
print(type(todos))
print(todos[10])

We are printing some random items to get an idea, how the todo item looks like. You can also view the item by visiting the endpoint in a browser.

{
  'userId': 1, 
  'id': 11, 
  'title': 'vero rerum temporibus dolor', 
  'completed': True
}

JSON with PythonPin

Find the users who completed maximum todos:

If you have a look at the data you will find that, there are multiple users each of them with a unique user-id, and each todo item has a property called ‘completed’. Let’s find out which users completed the maximum tasks.

import requests

todos_completed_by_each_user={}

response = requests.get("https://jsonplaceholder.typicode.com/todos")
todos = response.json()

for todo in todos:
    if todo['completed']:
        try:
            todos_completed_by_each_user[todo['userId']]+=1
        except KeyError:
            todos_completed_by_each_user[todo['userId']]=1

#sorting the dictionary based on maximum completed todos in

reverse order

max_todos_by_user = sorted(todos_completed_by_each_user.items(),
                   key=lambda x: x[1], reverse=True)

#Gets the

maximum number

 of todos completed
max_todos = max_todos_by_user[0][1]

users=[]

#Gets the list of users who have completed the maximum todos
for user,no_of_todos in max_todos_by_user:
    if no_of_todos<max_todos:
        continue
    users.append(str(user))

max_users = " and ".join(users)
print(max_users)

Now we have got the list of users who have completed the maximum number of todos. Let’s try to print the output in a nice way.

s='s' if len(users)>1 else ""
print(f'User{s} {max_users} completed {max_todos} TODOs')

The output will be like:

Users 5 and 10 and 8 completed 12 TODOs

Now let’s create json file called “completed_todos.json” which contains the completed todos for the users who have completed the maximum number of todos.

def filtered_todos(todo):
    has_max_count = str(todo["userId"]) in users
    is_complete = todo["completed"]
    return is_complete and has_max_count

# Write filtered TODOs to file.
with open("filtered_todos.json", "w") as data_file:
    filtered_todos = list(filter(filtered_todos, todos))
    json.dump(filtered_todos, data_file, indent=2)

If you are not aware of the filter() method, the filter() method filters the sequence with the help of a function that validates each element of the sequence is true or false.

Here the filter() method is using a function called filtered_todos(), which checks whether the user id is present in the users list who have completed the maximum todos and it also checks whether a todo is completed for the given user. If both conditions are true, then we are writing that todo in our output file using the dump() method.

The output file “completed_todos.json” contains only the completed todo list of the users who have completed the maximum todos.

Serializing and De-serializing custom Python objects:

Let’s create our own custom python object.

class EmployeeDetails:
    def __init__(self,firstname,lastname,age):
        self.firstname = firstname
        self.lastname = lastname
        self.age=age

    def get_name(self):
        return self.firstname+' '+self.lastname

emp=EmployeeDetails('John','Terry',29)
emp_name=emp.get_name()

We are creating a class called EmployeeDetails and a method that returns the name of the employee. Let’s create an object for it.

Now try to serialize the emp_name variable and see what happens.

PS C:\Users\Gopi\Desktop\backup\myPython\freelancer> python -i sample.py
John Terry
>>> json.dumps(emp_name)
'"John Terry"'

We are not facing any issues when trying to serialize emp_name which is of type string. See what happens when we try to serialize the custom python object emp that we created.

>>> json.dumps(emp)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Gopi\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "C:\Users\Gopi\AppData\Local\Programs\Python\Python36-32\lib\json\encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\Users\Gopi\AppData\Local\Programs\Python\Python36-32\lib\json\encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "C:\Users\Gopi\AppData\Local\Programs\Python\Python36-32\lib\json\encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'EmployeeDetails' is not JSON serializable

We are getting the error “EmployeeDetails” is not JSON serializable. Even though we can encode most of the python’s in-built data types, the json module doesn’t know how to serialize custom python objects.

Simplifying Data structures:

Instead of trying to encode custom object directly, what we can do is convert the custom object to some other representation, which the json module understands, then convert it to JSON.

Let’s understand how it works with the help of an example. To represent complex numbers, Python provides an in-built data type “complex”.  A complex number is expressed in the format a+bj where ‘a’ is the real part and ‘b’ is the imaginary part.

>>> import json
>>> z=2+5j
>>> type(z)
<class 'complex'>
>>> json.dumps(z)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Gopi\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "C:\Users\Gopi\AppData\Local\Programs\Python\Python36-32\lib\json\encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\Users\Gopi\AppData\Local\Programs\Python\Python36-32\lib\json\encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "C:\Users\Gopi\AppData\Local\Programs\Python\Python36-32\lib\json\encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'complex' is not JSON serializable

Since complex numbers are not JSON serializable, let’s try to convert the complex number is to a format, which the json module understands.

To create a complex number, what we need is only the real part and the imaginary part.

>>> z.real
2.0
>>> z.imag
5.0

When we pass the real and the imaginary part to a complex constructor, we will get a complex number.

>>> complex(2,5)==z
True

It is important to understand how to break a custom data type to its essential components in order to serialize and deserialize it.

Encoding Custom data types:

Now we have all the essential components to form a complex number. To convert the complex number to JSON, what we can do is create our own encoding function and pass it to the dump() method.

Below is the method we can use to encode complex numbers. Apart from complex numbers if we pass any other custom data types as input to the function, it will throw a type error.

def encode_complex_numbers(z):
    if isinstance(z,complex):
        return (z.real,z.imag)
    else:
        type_name=z.__class__.__name__
        raise TypeError(f'Object of type {type_name} is not JSON serializable')

Whenever we try to convert a custom data type that’s not JSON serializable, our encoding function will get called.

>>> json.dumps(2+5j,default=encode_complex_numbers)
'[2.0, 5.0]'
>>> json.dumps(emp,default=encode_complex_numbers)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Gopi\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "C:\Users\Gopi\AppData\Local\Programs\Python\Python36-32\lib\json\encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\Users\Gopi\AppData\Local\Programs\Python\Python36-32\lib\json\encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "sample.py", line 18, in encode_complex_numbers
    raise TypeError(f'Object of type {type_name} is not JSON serializable')
TypeError: Object of type EmployeeDetails is not JSON serializable

Notice that now we can serialize the complex numbers using our ‘encode_complex_numbers’ function. For any other custom data types, we will get a type error. You might get a question about how we do know that error is printed from our code. Notice the highlighted line, the sample is the filename where my encoding function is present, you will see your filename in that place. It gets called instead of the default() method.

Also, another thing to notice is we are returning the real and imaginary part of a complex number in a tuple, which json module understands.

Instead of overriding the default() method, the alternative approach is to create a sub-class called ‘ComplexEncoder’ under the standard class JSONEncoder.

class ComplexEncoder(json.JSONEncoder):
    def default(self,z):
        if isinstance(z,complex):
            return (z.real,z.imag)
        else:
            return super().default(z)

We are not handling the type error in the code, we are letting the base class handle it. Now we have two options to encode complex numbers. We have either use the ComplexEncoder class in the dump() method or we can create an object for it and call the encode() method.

>>> json.dumps(2+5j,cls=ComplexEncoder)
'[2.0, 5.0]'
>>> encoder=ComplexEncoder()
>>> encoder.encode(2+5j)
'[2.0, 5.0]'

Decoding Custom data types:

Even though we have the real and imaginary numbers of a complex number, is it sufficient to recreate a complex number? No, lets the see, when we decode the output of our encoding function, what we get back in return.

>>> json_format=json.dumps(2+5j,cls=ComplexEncoder)
>>> json.loads(json_format)
[2.0, 5.0]

What we have to get is a complex number, but what we actually got is just a list. If we have to recreate the complex number, we have to pass the values to the complex constructor. What we have missed is the metadata.

Understanding Metadata:

Metadata means the minimum amount of information that is sufficient and necessary to recreate the object.

The json module wants all the custom data types to be represented in a format that understands. Let’s create a file called complex_data.json and add all the metadata information. The following object which represents a complex number, is the metadata information, add it to the file.

{
    "__complex__": true,
    "real": 2,
    "imag": 5
}

Here the key __complex__ is the metadata we are looking for. You assign any value to the __complex__ key, it doesn’t matter. All we are going to do is verify if the key exists.

def decode_complex_number(dct):
    if '__complex__' in dct:
        return complex(dct['real'],dct['imag'])
    return dct

We are writing our own function to decode the complex number as we did for encoding. For any other custom data types, we are letting the default decoder to handle.

Whenever the load() method is called, we want our own decoder to decode the data instead of letting the base decoder to handle. To achieve this we need our decoding function to be passed to the object_hook parameter.

with open('complex_data.json','r') as data_file:
    data = data_file.read()
    z=json.loads(data,object_hook=decode_complex_number)

The output will be:

>>> z
(2+5j)
>>> type(z)
<class 'complex'>

The object_hook is similar to the default method of dumps(). If you have more than one complex number, you can add that also to complex_data.json and run the code once again.

[
  {
      "__complex__": true,
      "real": 2,
      "imag": 5
  },
  {
      "__complex__": true,
      "real": 7,
      "imag": 10
  },
  {
      "__complex__": true,
      "real": 13,
      "imag": 15
  }
]

Now our output will be a list of complex numbers.

>>> z
[(2+5j), (7+10j), (13+15j)]

Like added our own encoder class under JSONEncoder class, we can have our own decoder class under the class JSONDecoder instead of overriding object_hook.

class ComplexDecoder(json.JSONDecoder):
    def __init__(self,*args,**kwargs):
        super().__init__(object_hook=self.object_hook, *args, **kwargs)
    def object_hook(self,dct):
        if '__complex__' in dct:
            return complex(dct['real'],dct['imag'])
        else:
            return super().decode(dct)

with open('complex_data.json','r') as data_file:
    data = data_file.read()
    z=json.loads(data,cls=ComplexDecoder)
    print(z)

Conclusion:

We have read about JSON with Python in this article. Let’s try to remember the important points from whatever we learned so far.

  • Import the in-built json package
  • Read the data using load or loads()
  • Process the data
  • Write the processed at using dump() or dumps()
  • If your handling a custom data type, get the metadata, and write your own decoder and encoder to read and write the data.
Crack System Design Interviews
Translate »