PySpark — Create Spark Datatype Schema from String

Subham Khandelwal
2 min readOct 4, 2022

Are you also tired manually writing the schema for a Data Frame in Spark SQL types such as IntegerType, StringType, StructType etc. ?

Then this is for you…

PySpark has an inbuilt method to do the task in-hand : _parse_datatype_string .

# Import method _parse_datatype_string
from pyspark.sql.types import _parse_datatype_string
# Create new Schema for data
_schema_str = "id int, name string"
_schema = _parse_datatype_string(_schema_str)
print(_schema)
Example 1
# One more example with not null column
_schema_str_2 = "id int not null, name double, subjects string"
_schema_2 = _parse_datatype_string(_schema_str_2)
print(_schema_2)

We can also convert the complex datatypes such as Map or Array

# Working on Complex types such as Map or Array
_schema_str_3 = "id int, name map<string, string>, subject array<string>"
_schema_3 = _parse_datatype_string(_schema_str_2)
print(_schema_3)
Example 3

Check out iPython notebook on GitHub — https://github.com/subhamkharwal/ease-with-apache-spark/blob/master/2_create_schema_from_string.ipynb

Wish to Buy me a Coffee: Buy Subham a Coffee

--

--