PySpark — Flatten JSON/Struct Data Frame dynamically

We always have use cases where we have to flatten the complex JSON/Struct Data Frame into flattened simple Data Frame just like the example below:

example.this.that => example_this_that

Flatten JSON/Struct Data Frame Data

Following code snippet does the exact job dynamically. No manual effort required to expand the data structure or to determine the schema.

Lets first create an example Data Frame for the job

Create Example Data Frame

Create Python function to do the magic

Python function to do the magic

Now, lets run our example Data Frame against the Python Method to get the flattened Data Frame

Flattened Data Frame

In case we want to explode the Array data further

Exploded Data Frame

Checkout the complete iPython Notebook on Github — https://github.com/subhamkharwal/ease-with-apache-spark/blob/master/5_flatten_json_data_dynamically.ipynb

Checkout the EaseWithApacheSpark series — https://subhamkharwal.medium.com/learnbigdata101-spark-series-940160ff4d30

Wish to Buy me a Coffee: Buy Subham a Coffee

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store