Apache Spark Interview Series — Test your Knowledge 🧠

Questions on Apache Spark to test your ability for Interviews and knowledge on Spark background

Subham Khandelwal
3 min readMar 17, 2024

Curated Apache Spark/PySpark Interview Challenge Series to understand it better. Challenges are posted on LinkedIn, so make sure to Follow me on LinkedIn. Answer to each challenge are posted in the comments section.

Photo by Headway on Unsplash

Note of Caution ⚠️

These questions will check your understanding and skills in Apache Spark, not generic questions for Spark knowledge. Proceed with Caution 😉

Challenges 💬

Spark Challenge1️⃣: Code execution on Driver or on Executor ??

Spark Challenge2️⃣: Read all CSV files in nested folders

Spark Challenge3️⃣: File name in CSV file in column while reading data

Spark Challenge4️⃣: How many JOBs while reading file ??

Spark Challenge5️⃣: Handling JSON data

Spark Challenge6️⃣: Coalesce vs Repartition, how many files ??

Spark Challenge7️⃣: Stages and DAG, determine number of Stages ??

Spark Challenge8️⃣: Spark Core API, Scala vs Python ??

Spark Challenge9️⃣: JDBC Optimization, Read data faster ??

Spark Challenge1️⃣0️⃣: Reading Complex JSON data

Spark Challenge1️⃣1️⃣: Writing data

Spark Challenge1️⃣2️⃣: Estimate number of Partitions

This Page will be updated as questions are posted on LinkedIn. Make sure to follow me on LinkedIn, to not miss any content.

Important Links and References 🏷️

LinkedIn Profile: https://www.linkedin.com/in/subhamkharwal

PySpark Zero to Hero Series on YouTube: https://youtube.com/playlist?list=PL2IsFZBGM_IHCl9zhRVC1EXTomkEp_1zm&si=Q664l-TFXf4wj1We

Spark Streaming with PySpark on YouTube: https://youtube.com/playlist?list=PL2IsFZBGM_IEtp2fF5xxZCS9CYBSHV2WW&si=4rF9V-Px9EJTiIiU

Checkout Ease With Data YouTube Channel: https://www.youtube.com/@easewithdata

Wish to connect with me: https://topmate.io/subham_khandelwal

--

--