PySpark — Connect Azure ADLS Gen 2
Cloud Distributed Storage spaces such as Google GCS, Amazon S3 and Azure ADLS often serves as data endpoints in many big data workloads.
Today, we are going to try and connect Azure ADLS to our PySpark Cluster. And as you know to begin with we would definitely need an Azure Account and Storage Account created.
Once you have Storage Account and Blob contained deployed, like in our case.
Checkout Azure Documentation to create a ADLS Gen 2 Container — https://learn.microsoft.com/en-us/azure/storage/blobs/create-data-lake-storage-account
Create the Service Principle (SP) required to access the same. Move to Home > Azure Active Directory > App Registrations > New Registration
Once the Service Principle is created, lets assign the correct roles to access the ADLS. Move to Home > Storage Accounts > {Your Account} >Access Control (IAM) > Add > Add role assignment