Member-only story

PySpark — Connect Azure ADLS Gen 2

3 min readDec 18, 2022

Cloud Distributed Storage spaces such as Google GCS, Amazon S3 and Azure ADLS often serves as data endpoints in many big data workloads.

Today, we are going to try and connect Azure ADLS to our PySpark Cluster. And as you know to begin with we would definitely need an Azure Account and Storage Account created.

Once you have Storage Account and Blob contained deployed, like in our case.

Checkout Azure Documentation to create a ADLS Gen 2 Container — https://learn.microsoft.com/en-us/azure/storage/blobs/create-data-lake-storage-account

Create the Service Principle (SP) required to access the same. Move to Home > Azure Active Directory > App Registrations > New Registration

Once the Service Principle is created, lets assign the correct roles to access the ADLS. Move to Home > Storage Accounts > {Your Account} >Access Control (IAM) > Add > Add role assignment

PySpark — Connect Azure ADLS Gen 2

Written by Subham Khandelwal

Responses (1)