PySpark — Spark Streaming Error and Exception Handling
Understand How to handle Spark Streaming Errors and Exceptions
We always dream of writing picture perfect code and even try to do so, but what if — the data itself is bad/malformed or say some service (JDBC) is down while processing it? Now a perfect code is still not perfect if its not ready to handle such scenarios i.e. capture data for re-processing failed due to errors or exceptions while processing.
Today we will understand and handle such errors and exception in Spark Structured Streaming code.
Now if you are new to Spark, PySpark or want to learn more — I teach Big Data, Spark, Data Engineering & Data Warehousing on my YouTube Channel — Ease With Data. Improve your PySpark Skill with this Playlist. Spark Streaming with PySpark Playlist.
Use Case 📃
We are reading Device Data JSON payloads with the below mentioned structure through Kafka in Spark Streaming. Our goal is to flatten and explode the data and save it in Postgres table using JDBC.
{
"eventId": "e3cb26d3-41b2-49a2-84f3-0156ed8d7502",
"eventOffset": 10001,
"eventPublisher": "device",
"customerId": "CI00103",
"data": {
"devices": [
{
"deviceId": "D001"…