PySpark — Delta Lake Column Mapping

Subham Khandelwal
3 min readNov 19, 2022

Delta Lake validates the schema of data that is being written to it. It supports explicit DDL operations to alter table schema definitions. Following are the types of changes that Delta Lake currently supports:

  1. Adding of new columns
  2. Re-ordering of columns
  3. Removing or Renaming columns(few features are still experimental)

We would look into the third point — Removing or Renaming Columns.

Representation Image (Credit: Delta.io)

In situations where we need to update the schema of an existing table we also requires to change the schema of underlying data. But Delta table enables us with a fantastic option called - Column Mappings.

And to use the same, we need to change the protocol version of the existing table.

Protocol Versions for Delta Lake

To know more about Protocol version, checkout — https://docs.delta.io/latest/versioning.html

Lets run through an example to checkout this feature. We start my creating a demo delta table “spark_delta_mapping

%%sparksql

create table sales_delta_mapping
using delta
as
select * from sales_managed;

--

--

No responses yet