PySpark — Create Spark Datatype Schema from String

Subham Khandelwal
2 min readOct 4, 2022


Are you also tired manually writing the schema for a Data Frame in Spark SQL types such as IntegerType, StringType, StructType etc. ?

Then this is for you…

PySpark has an inbuilt method to do the task in-hand : _parse_datatype_string .

# Import method _parse_datatype_string
from pyspark.sql.types import _parse_datatype_string
# Create new Schema for data
_schema_str = "id int, name string"
_schema = _parse_datatype_string(_schema_str)
Example 1
# One more example with not null column
_schema_str_2 = "id int not null, name double, subjects string"
_schema_2 = _parse_datatype_string(_schema_str_2)
Example 2

We can also convert the complex datatypes such as Map or Array

# Working on Complex types such as Map or Array
_schema_str_3 = "id int, name map<string, string>, subject array<string>"
_schema_3 = _parse_datatype_string(_schema_str_2)
Example 3

Check out iPython notebook on GitHub —

Wish to Buy me a Coffee: Buy Subham a Coffee



Subham Khandelwal

⚒️ Senior Data Engineer with 10+ YOE | 📽️ YouTube channel: | 📞 TopMate :