Member-only story

PySpark — What is Spark Connect?

Understand the benefits of Spark Connect and How to use Spark Connect ?

Subham Khandelwal
7 min read1 day ago

With the release of Apache 3.4 — Spark Connect was introduced, which allowed to decouple Client and Spark Clusters. Previously Spark Application and Spark Driver were tightly coupled, which lead to multiple challenges including dependency resolutions, code debugging etc.

With Spark Connect, developers are in for treats 😇 Now Spark can be used from webapps, edge nodes, applications, IDEs etc. with much ease.

Spark Connect Architecture (Credits — Apache Spark Website)

I teach Big Data, Databricks, Spark, Data Engineering & Data Warehousing on my YouTube Channel — Ease With Data. Improve your PySpark Skill with this Playlist, Spark Streaming with this Playlist and Databricks with this Playlist. Read this article for Free here.

What is Spark Connect and How it Works? ☝️

Spark Connect follows Client-Server Architecture, that allows to connect Spark Driver/Cluster remotely. Spark Connect client APIs from the application sends commands/operations to the Spark Connect Server over network connections via gRPC (Google Remote Procedure Call) protocols.

Spark Connect Server which is part of Spark Cluster, receives the commands/operations and trigger the execution process. Spark Connect…

--

--

No responses yet

Write a response