PySpark - Create Data Frame from List or RDD on the fly

Subham Khandelwal
2 min readOct 4, 2022

PySpark enables certain popular methods to create data frames on the fly from rdd, iterables such as Python List, RDD etc.

Method 1 — SparkSession range() method

# Create an Dataframe from range of values
df_range_1 = spark.range(5)
df_range_1.show(5, truncate = False)
Create an Dataframe from range of values
Create an Data Frame from range of values
# You can optionally specify start, end and steps as well
df_range_2 = spark.range(start = 1, end = 10, step = 2)
df_range_2.show(10, False)
Optionally specify start, end and steps as well

Method 2 — Spark createDataFrame() method

# Create Python Native List of Data
_data = [
["1", "Ram"],
["2", "Shyam"],
["3", "Asraf"],
["4", None]
]
# Create the list of column names
_cols = ["id", "name"]
# Create Data Frame using the createDataFrame method
df_users = spark.createDataFrame(data = _data, schema=_cols)
df_users.printSchema()
# Check Data Frame
df_users.show(truncate=False)

--

--