When I check the documentation :
You can define a table as having the IDENTITY property when you first create the table by using syntax that is similar to the following statement:
CREATE TABLE dbo.T1
( C1 INT IDENTITY(1,1) NOT NULL
, C2 INT NULL
)
WITH
( DISTRIBUTION = HASH(C2)
, CLUSTERED COLUMNSTORE INDEX
)
;
Update :
After verifying I think GENERATED ALWAYS AS IDENTITY
syntax is not supported directly in Delta Lake
Instead, you can use a workaround to simulate an auto-incrementing column.
- Create the table without the identity column.
%%sql
CREATE TABLE Staging.DimContract_NAC_Test
(
IdentCode INT,
ContractId INT,
SourceUpdatedDate TIMESTAMP,
EffectiveStartDate DATE,
EffectiveEndDate DATE,
ETLId INT,
SourceId INT,
IsIdentCodeDuplicate BOOLEAN
)
USING DELTA;
- Insert data into the table while generating an auto-incrementing column using Spark.
from pyspark.sql import SparkSession
from pyspark.sql.functions import monotonically_increasing_id
# Initialize Spark Session
spark = SparkSession.builder.appName("AutoIncrementColumn").getOrCreate()
# Load your data into a DataFrame
data = [
(123, 1, '2023-06-01 12:00:00', '2023-06-01', '2024-06-01', 1001, 2001, True),
(124, 2, '2023-06-02 13:00:00', '2023-06-02', '2024-06-02', 1002, 2002, False),
# Add more rows as needed
]
columns = ["IdentCode", "ContractId", "SourceUpdatedDate", "EffectiveStartDate", "EffectiveEndDate", "ETLId", "SourceId", "IsIdentCodeDuplicate"]
df = spark.createDataFrame(data, columns)
# Add an auto-incrementing column
df_with_id = df.withColumn("DimContractKey", monotonically_increasing_id())
# Reorder columns if necessary
df_with_id = df_with_id.select("DimContractKey", "IdentCode", "ContractId", "SourceUpdatedDate", "EffectiveStartDate", "EffectiveEndDate", "ETLId", "SourceId", "IsIdentCodeDuplicate")
# Write the DataFrame to the Delta table
df_with_id.write.format("delta").mode("append").saveAsTable("Staging.DimContract_NAC_Test")