Nested JSON to CSV with python pandas

·

2 min read

Nested JSON objects have one or more levels of additional objects or arrays. CSV, on the other hand, is a flat structure with rows and columns. The goal is to "flatten" the JSON structure, converting nested elements into a format that can be represented in columns. We can use pandas and json to flatten nested JSON and export it to a CSV file.

Example Nested JSON

[
    {
        "name": "John Doe",
        "email": "johndoe@example.com",
        "location": {
            "city": "New York",
            "country": "USA"
        },
        "skills": ["Python", "Data Analysis"]
    },
    {
        "name": "Jane Doe",
        "email": "janedoe@example.com",
        "location": {
            "city": "San Francisco",
            "country": "USA"
        },
        "skills": ["JavaScript", "React"]
    }
]

Install Required Libraries

Ensure you have pandas installed. If not, you can install it using pip:

pip install pandas

Load and flatten JSON

First, load your JSON data. You can load it from a file or directly as a string (for this example, we'll assume it's loaded into a variable).

import json
from pandas import json_normalize

with open('data.json') as file:
    data = json.load(file)

data = json.loads(json_string)

# Use pandas to normalize the nested JSON structure
flat_data = json_normalize(data, sep='_')
print(flat_data)

The sep parameter specifies the separator to use when flattening the JSON. This example uses an underscore to separate nested fields (e.g., "location_city").

Convert to CSV

Finally, convert the flattened DataFrame to a CSV file.

flat_data.to_csv('output.csv', index=False)

For deeply nested structures or arrays, you might need to preprocess the data before flattening it or use a more complex approach with custom functions to handle the specific structure of your JSON.