PySpark — Merge Data Frames with different Schema

Subham Khandelwal
2 min readOct 8, 2022


In order to merge data from multiple systems, we often come across situations where we might need to merge data frames which doesn’t have same columns or the columns are in different order.

union() and unionByName() are two famous method that comes into play when we want to merge two Data Frames. But, there is a small catch to it.

Union works with column sequences i.e. both Data Frames should have same columns and in-order. On the other hand, UnionByName does the same job but with column names. So, until we have same columns in both data frames we can merge them easily.

Lets check out this in action. First we will create our example Data Frames

# Example DataFrame 1
_data = [
["C101", "Akshay", 21, "22-10-2001"],
["C102", "Sivay", 20, "07-09-2000"],
["C103", "Aslam", 23, "04-05-1998"],
_cols = ["ID", "NAME", "AGE", "DOB"]df_1 = spark.createDataFrame(data = _data, schema = _cols)
df_1.printSchema(), False)
Example DataFrame 1
# Example DataFrame 2
_data = [
["C106", "Suku", "Indore", ["Maths", "English"]],
["C110", "Jack", "Mumbai", ["Maths", "English", "Science"]],
["C113", "Gopi", "Rajkot", ["Social Science"]],
_cols = ["ID", "NAME", "ADDRESS", "SUBJECTS"]df_2 = spark.createDataFrame(data = _data, schema = _cols)
df_2.printSchema(), False)
Example DataFrame 2

Now, we add missing columns from either Data Frames

# Now before we can merge the dataframes we have to add the extra columns from either dataframes
from pyspark.sql.functions import lit
# Lets add missing columns from df_2 to df_1
for col in df_2.columns:
if col not in df_1.columns:
df_1 = df_1.withColumn(col, lit(None))

# Lets add missing columns from df_1 to df_2
for col in df_1.columns:
if col not in df_2.columns:
df_2 = df_2.withColumn(col, lit(None))

# View the dataframes
Fix both DataFrames

Finally, we are ready to merge

# Lets use unionByName to do the merge successfully
df = df_1.unionByName(df_2)
df.printSchema(), False)
Merged DataFrame

Checkout the iPython Notebook on Github —

Checkout PySpark Medium series —

Wish to Buy me a Coffee: Buy Subham a Coffee