PySpark — Connect Azure ADLS Gen 2

Subham Khandelwal
3 min readDec 18, 2022

Cloud Distributed Storage spaces such as Google GCS, Amazon S3 and Azure ADLS often serves as data endpoints in many big data workloads.

Representation Image

Today, we are going to try and connect Azure ADLS to our PySpark Cluster. And as you know to begin with we would definitely need an Azure Account and Storage Account created.

Once you have Storage Account and Blob contained deployed, like in our case.

Checkout Azure Documentation to create a ADLS Gen 2 Container — https://learn.microsoft.com/en-us/azure/storage/blobs/create-data-lake-storage-account

Azure Blob Container

Create the Service Principle (SP) required to access the same. Move to Home > Azure Active Directory > App Registrations > New Registration

Service Principle

Once the Service Principle is created, lets assign the correct roles to access the ADLS. Move to Home > Storage Accounts > {Your Account} >Access Control (IAM) > Add > Add role assignment

--

--

Responses (1)