Are you also tired manually writing the schema for a Data Frame in Spark SQL types such as IntegerType, StringType, StructType etc. ?
Then this is for you…
PySpark has an inbuilt method to do the task in-hand : _parse_datatype_string .
# Import method _parse_datatype_string from pyspark.sql.types import _parse_datatype_string# Create new Schema for data _schema_str = "id int, name string" _schema = _parse_datatype_string(_schema_str) print(_schema)
Press enter or click to view image in full size
Example 1
# One more example with not null column _schema_str_2 = "id int not null, name double, subjects string" _schema_2 = _parse_datatype_string(_schema_str_2) print(_schema_2)
We can also convert the complex datatypes such as Map or Array
# Working on Complex types such as Map or Array _schema_str_3 = "id int, name map<string, string>, subject array<string>" _schema_3 = _parse_datatype_string(_schema_str_2) print(_schema_3)