In this article we will discuss how to convert JSON to Pandas DataFrame in Python.

Table of Contents


Introduction

All data science projects begin with accessing the data and reading it correctly. With a large availability of APIs to query large volumes of data from a variety of sources, JSON objects became a popular source for the projects’ data.

As you are working in Python, most likely you would want your data to be in a format of a list or a DataFrame.

Let’s see how we can quickly convert JSON to Pandas DataFrame in Python.

To continue following this tutorial we will need the two Python libraries: json (prebuilt in Python) and pandas.

If you don’t have pandas installed, please open “Command Prompt” (on Windows) and install it using the following code:


pip install pandas

Make sure your Pandas version is >= 1.0.3. You can check it by running:


import pandas as pd

print(pd.__version__)

If your version is less than 1.0.3, please update your Pandas by running the following code in your Command Prompt (or Terminal):


pip install --upgrade pandas

Create a Sample JSON File

As the first step we will create a few sample JSON files that we will later convert to a Pandas DataFrame.

The first file will be a very simple one:


[
    {
        "userId": 1,
        "firstName": "Jake",
        "lastName": "Taylor",
        "phoneNumber": "123456",
        "emailAddress": "john.smith@example.com"
    },
    {
        "userId": 2,
        "firstName": "Brandon",
        "lastName": "Glover",
        "phoneNumber": "123456",
        "emailAddress": "brandon.glover@example.com"
    }
]

Let’s save it as sample.json in the same location as your Python code.

And the second file will be a nested JSON file:


[
    {
        "userId": 1,
        "firstName": "Jake",
        "lastName": "Taylor",
        "phoneNumber": "123456",
        "emailAddress": "john.smith@example.com",
        "courses": {
            "course1": "mathematics",
            "course2": "physics",
            "course3": "engineering"
        }
    },
    {
        "userId": 2,
        "firstName": "Brandon",
        "lastName": "Glover",
        "phoneNumber": "123456",
        "emailAddress": "brandon.glover@example.com",
        "courses": {
            "course1": "english",
            "course2": "french",
            "course3": "sociology"
        }
    }
]

Let’s save it as nested_sample.json in the same location as your Python code.


Convert simple JSON to Pandas DataFrame in Python

Reading a simple JSON file is very simple using .read_json() Pandas method. It parses a JSON string and converts it to a Pandas DataFrame:


import pandas as pd

df = pd.read_json("sample.json")

Let’s take a look at the JSON converted to DataFrame:


print(df)

We get exactly the content of the JSON file converted to a DataFrame.


Convert nested JSON to Pandas DataFrame in Python

When comparing nested_sample.json with sample.json you see that the structure of the nested JSON file is different as we added the courses field which contains a list of values in it.

In this case, to convert it to Pandas DataFrame we will need to use the .json_normalize() method. It works differently than .read_json() and normalizes semi-structured JSON into a flat table:


import pandas as pd
import json

with open('nested_sample.json','r') as f:
    data = json.loads(f.read())

df = pd.json_normalize(data)

Let’s take a look at the JSON converted to DataFrame:


print(df)

We get exactly the content of the JSON file converted to a DataFrame.


Conclusion

In this article we discussed how to convert JSON to Pandas DataFrame in Python using json and pandas libraries.

Feel free to leave comments below if you have any questions or have suggestions for some edits and check out more of my Python Programming articles.