PySpark - Create Data Frame from List or RDD on the fly
2 min readOct 4, 2022
PySpark enables certain popular methods to create data frames on the fly from rdd, iterables such as Python List, RDD etc.
Method 1 — SparkSession range() method
# Create an Dataframe from range of values
df_range_1 = spark.range(5)
df_range_1.show(5, truncate = False)
# You can optionally specify start, end and steps as well
df_range_2 = spark.range(start = 1, end = 10, step = 2)
df_range_2.show(10, False)
Method 2 — Spark createDataFrame() method
# Create Python Native List of Data
_data = [
["1", "Ram"],
["2", "Shyam"],
["3", "Asraf"],
["4", None]
]# Create the list of column names
_cols = ["id", "name"]# Create Data Frame using the createDataFrame method
df_users = spark.createDataFrame(data = _data, schema=_cols)
df_users.printSchema()# Check Data Frame
df_users.show(truncate=False)