Data Lakehouse with PySpark — Setup PySpark Docker Jupyter Lab env

Subham Khandelwal
3 min readFeb 10, 2023

As part of the series Data Lakehouse with PySpark, we need to setup the PySpark environment on Docker in Jupyter lab. Today we are going to set up the same in few simple steps. This environment can further be used for your personal use-cases and practise.

Representation Image

If you are still not following, checkout the playlist on YouTube — https://youtube.com/playlist?list=PL2IsFZBGM_IExqZ5nHg0wbTeiWVd8F06b

There are few prerequisites for the environment setup, check below image. We will not need AWS account for now, but will definitely needed in future course.

Prerequisites

Now, install docker desktop on your machine. You can download it from Docker official website — https://www.docker.com/.

The setup is pretty straight forward and will not need any speciality.

Turn on the docker desktop and wait until the Docker Engine colour changes to GREEN.

Docker Desktop

--

--