Nested JSON to CSV with python pandas
Nested JSON objects have one or more levels of additional objects or arrays. CSV, on the other hand, is a flat structure with rows and columns. The goal is to "flatten" the JSON structure, converting nested elements into a format that can be represented in columns. We can use pandas
and json
to flatten nested JSON and export it to a CSV file.
Example Nested JSON
[
{
"name": "John Doe",
"email": "johndoe@example.com",
"location": {
"city": "New York",
"country": "USA"
},
"skills": ["Python", "Data Analysis"]
},
{
"name": "Jane Doe",
"email": "janedoe@example.com",
"location": {
"city": "San Francisco",
"country": "USA"
},
"skills": ["JavaScript", "React"]
}
]
Install Required Libraries
Ensure you have pandas installed. If not, you can install it using pip:
pip install pandas
Load and flatten JSON
First, load your JSON data. You can load it from a file or directly as a string (for this example, we'll assume it's loaded into a variable).
import json
from pandas import json_normalize
with open('data.json') as file:
data = json.load(file)
data = json.loads(json_string)
# Use pandas to normalize the nested JSON structure
flat_data = json_normalize(data, sep='_')
print(flat_data)
The sep
parameter specifies the separator to use when flattening the JSON. This example uses an underscore to separate nested fields (e.g., "location_city").
Convert to CSV
Finally, convert the flattened DataFrame to a CSV file.
flat_data.to_csv('output.csv', index=False)
For deeply nested structures or arrays, you might need to preprocess the data before flattening it or use a more complex approach with custom functions to handle the specific structure of your JSON.