织梦CMS - 轻松建站从此开始!

欧博ABG-会员注册-官网网址

欧博allbetHow to Convert JSON Data into a DataFrame

时间:2025-06-19 20:43来源: 作者:admin 点击: 4 次
This short tutorial will guide you through the process of converting JSON data into a Pandas DataFrame.

How to Convert JSON Data into a DataFrame with Pandas

How to Convert JSON Data into a DataFrame with Pandas


Image by Author | DALLE-3 & Canva
 

If you've ever had the chance to work with data, you've probably come across the need to load JSON files (short for JavaScript Object Notation) into a Pandas DataFrame for further analysis. JSON files store data in a format that is clear for people to read and also simple for computers to understand. However, JSON files can sometimes be complicated to navigate through. Therefore, we load them into a more structured format like DataFrames - that is set up like a spreadsheet with rows and columns.

I will show you two different ways to convert JSON data into a Pandas DataFrame. Before we discuss these methods, let's suppose this dummy nested JSON file that I'll use as an example throughout this article.

{ "books": [ { "title": "One Hundred Years of Solitude", "author": "Gabriel Garcia Marquez", "reviews": [ { "reviewer": { "name": "Kanwal Mehreen", "location": "Islamabad, Pakistan" }, "rating": 4.5, "comments": "Magical and completely breathtaking!" }, { "reviewer": { "name": "Isabella Martinez", "location": "Bogotá, Colombia" }, "rating": 4.7, "comments": "A marvelous journey through a world of magic." } ] }, { "title": "Things Fall Apart", "author": "Chinua Achebe", "reviews": [ { "reviewer": { "name": "Zara Khan", "location": "Lagos, Nigeria" }, "rating": 4.9, "comments": "Things Fall Apart is the best of contemporary African literature." }]}]}

The above-mentioned JSON data represents a list of books, where each book has a title, author, and a list of reviews. Each review, in turn, has a reviewer (with a name and location) and a rating and comments.

Method 1: Using the json.load() and pd.DataFrame() functions

The easiest and most straightforward approach is to use the built-in json.load() function to parse our JSON data. This will convert it into a Python dictionary, and we can then create the DataFrame directly from the resulting Python data structure. However, it has a problem - it can only handle single nested data. So, for the above case, if you only use these steps with this code:

import json import pandas as pd #Load the JSON data with open('books.json','r') as f: data = json.load(f) #Create a DataFrame from the JSON data df = pd.DataFrame(data['books']) df

Your output might look like this:

Output:
 

json.load() output

json.load() output


 

In the reviews column, you can see the entire dictionary. Therefore, if you want the output to appear correctly, you have to manually handle the nested structure. This can be done as follows:

#Create a DataFrame from the nested JSON data df = pd.DataFrame([ { 'title': book['title'], 'author': book['author'], 'reviewer_name': review['reviewer']['name'], 'reviewer_location': review['reviewer']['location'], 'rating': review['rating'], 'comments': review['comments'] } for book in data['books'] for review in book['reviews'] ])

Updated Output:
 

json.load() output

json.load() output


 

Here, we are using list comprehension to create a flat list of dictionaries, where each dictionary contains the book information and the corresponding review. We then create the Pandas DataFrae using this.

However the issue with this approach is that it demands more manual effort to manage the nested structure of the JSON data. So, what now? Do we have any other option?

Totally! I mean, come on. Given that we're in the 21st century, facing such a problem without a solution seems unrealistic. Let's see the other approach.

Method 2 (Recommended): Using the json_normalize() function

The json_normalize() function from the Pandas library is a better way to manage nested JSON data. It automatically flattens the nested structure of the JSON data, creating a DataFrame from the resulting data. Let's take a look at the code:

import pandas as pd import json #Load the JSON data with open('books.json', 'r') as f: data = json.load(f) #Create the DataFrame using json_normalize() df = pd.json_normalize( data=data['books'], meta=['title', 'author'], record_path='reviews', errors='raise' ) df

Output:
 

json.load() output

json.load() output


 

The json_normalize() function takes the following parameters:

data: The input data, which can be a list of dictionaries or a single dictionary. In this case, it's the data dictionary loaded from the JSON file.

record_path: The path in the JSON data to the records you want to normalize. In this case, it's the 'reviews' key.

meta: Additional fields to include in the normalized output from the JSON document. In this case, we're using the 'title' and 'author' fields. Note that columns in metadata usually appear at the end. This is how this function works. As far as the analysis is concerned, it doesn't matter, but for some magical reason, you want these columns to appear before. Sorry, but you have to do them manually.

errors: The error handling strategy, which can be 'ignore', 'raise', or 'warn'. We have set it to 'raise', so if there are any errors during the normalization process, it will raise an exception.

Wrapping Up

Both of these methods have their own advantages and use cases, and the choice of method depends on the structure and complexity of the JSON data. If the JSON data has a very nested structure, the json_normalize() function might be the most suitable option, as it can handle the nested data automatically. If the JSON data is relatively simple and flat, the pd.read_json() function might be the easiest and most straightforward approach.

When dealing with large JSON files, it's crucial to think about memory usage and performance since loading the whole file into memory might not work. So, you might have to look into other options like streaming the data, lazy loading, or using a more memory-efficient format like Parquet.


 

Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook "Maximizing Productivity with ChatGPT". As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She's also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.

(责任编辑:)
------分隔线----------------------------
发表评论
请自觉遵守互联网相关的政策法规,严禁发布色情、暴力、反动的言论。
评价:
表情:
用户名: 验证码:
发布者资料
查看详细资料 发送留言 加为好友 用户等级: 注册时间:2025-06-23 20:06 最后登录:2025-06-23 20:06
栏目列表
推荐内容