PySpark — Flatten JSON/Struct Data Frame dynamically
Flatten JSON data dynamically in tabular structure
3 min readOct 7, 2022
We always have use cases where we have to flatten the complex JSON/Struct Data Frame into flattened simple Data Frame just like the example below:
example.this.that => example_this_that
Following code snippet does the exact job dynamically. No manual effort required to expand the data structure or to determine the schema.
Lets first create an example Data Frame for the job
# Lets create an Example Data Frame to hold JSON data# Example Data Frame with column having JSON data
_data = [
['EMP001', '{"dept" : "account", "fname": "Ramesh", "lname": "Singh", "skills": ["excel", "tally", "word"]}'],
['EMP002', '{"dept" : "sales", "fname": "Siv", "lname": "Kumar", "skills": ["biking", "sales"]}'],
['EMP003', '{"dept" : "hr", "fname": "MS Raghvan", "skills": ["communication", "soft-skills"], "hobbies" : {"cycling": "expert", "computers":"basic"}}']
]# Columns for the data
_cols = ['emp_no', 'raw_data']# Lets create the raw Data Frame
df_raw = spark.createDataFrame(data = _data, schema = _cols)# Determine the schema of the JSON payload from the column…