"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How to Manage Nested JSON Objects as a DataFrame in Pandas?

How to Manage Nested JSON Objects as a DataFrame in Pandas?

Published on 2024-11-08
Browse:845

How to Manage Nested JSON Objects as a DataFrame in Pandas?

Reading Nested JSON with Nested Objects as a Pandas DataFrame

When dealing with JSON data containing nested objects, manipulating it efficiently in Python is crucial. Pandas provides a powerful tool to achieve this - json_normalize.

Expanding the Array into Columns

To expand the locations array into separate columns, use json_normalize as follows:

import json
import pandas as pd

with open('myJson.json') as data_file:
    data = json.load(data_file)

df = pd.json_normalize(data, 'locations', ['date', 'number', 'name'], record_prefix='locations_')

print(df)

This will create a dataframe with expanded columns:

  locations_arrTime locations_arrTimeDiffMin locations_depTime  \
0                                                        06:32   
1             06:37                        1             06:40   
2             08:24                        1                     

  locations_depTimeDiffMin           locations_name locations_platform  \
0                        0  Spital am Pyhrn Bahnhof                  2   
1                        0  Windischgarsten Bahnhof                  2   
2                                    Linz/Donau Hbf               1A-B   

  locations_stationIdx locations_track number    name        date  
0                    0          R 3932         R 3932  01.10.2016  
1                    1                         R 3932  01.10.2016  
2                   22                         R 3932  01.10.2016 

Handling Multiple JSON Objects

For JSON files containing multiple objects, the approach depends on the desired data structure.

Keep Individual Columns

To keep individual columns (date, number, name, locations), use the following:

df = pd.read_json('myJson.json')
df.locations = pd.DataFrame(df.locations.values.tolist())['name']
df = df.groupby(['date', 'name', 'number'])['locations'].apply(','.join).reset_index()

print(df)

This will group the data and concatenate the locations:

        date    name number                                          locations
0  2016-01-10  R 3932         Spital am Pyhrn Bahnhof,Windischgarsten Bahnho...

Flatten the Data Structure

If you prefer a flattened data structure, you can use json_normalize with the following settings:

df = pd.read_json('myJson.json', orient='records', convert_dates=['date'])

print(df)

This will output the data in a single table:

  number    date                   name  ... locations.arrTimeDiffMin locations.depTimeDiffMin locations.platform
0             R 3932  2016-01-10  R 3932  ...                       0                         0                  2
1             R 3932  2016-01-10  R 3932  ...                       1                         0                  2
2             R 3932  2016-01-10  R 3932  ...                       1                         -                  1A-B
Release Statement This article is reprinted at: 1729739643 If there is any infringement, please contact [email protected] to delete it
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3