How to Create Delta Table in Azure Synapse Analytics with Id Auto Increment Identity Column ?

Vedant Desai 656 Reputation points
2024-06-04T06:11:14.4766667+00:00

I have created the Delta Lake Delta tables In ADLS using Synapse Notebook and in that table, I want to add an identity column (Auto increment 1,1) but I am not able to create the same, Below is my Create table script and error which i am facing.

Table Script:

%%sql
CREATE TABLE Staging.DimContract_NAC_Test
(
    DimContractKey BIGINT GENERATED ALWAYS AS IDENTITY,
    IdentCode INT,
    ContractId INT,
    SourceUpdatedDate TIMESTAMP,
    EffectiveStartDate DATE,
    EffectiveEndDate DATE,
    ETLId INT,
    SourceId INT,
    IsIdentCodeDuplicate boolean
)
USING DELTA;

Error:

Syntax error at or near 'GENERATED'(line 3, pos 26) == SQL == CREATE TABLE Staging.DimContract_NAC_Test ( DimContractKey BIGINT GENERATED ALWAYS AS IDENTITY, --------------------------^^^ IdentCode INT, ContractId INT, SourceUpdatedDate TIMESTAMP, EffectiveStartDate DATE, EffectiveEndDate DATE, ETLId INT, SourceId INT, IsIdentCodeDuplicate boolean ) USING DELTA org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306) org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:143) org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52) org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:89) io.delta.sql.parser.DeltaSqlParser.$anonfun$parsePlan$1(DeltaSqlParser.scala:78) io.delta.sql.parser.DeltaSqlParser.parse(DeltaSqlParser.scala:111) io.delta.sql.parser.DeltaSqlParser.parsePlan(DeltaSqlParser.scala:75) com.microsoft.azure.synapse.ml.predict.SynapsePredictParser.parsePlan(SynapsePredictParser.scala:19) org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620) org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:120) org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620) org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617) org.apache.livy.repl.SQLInterpreter.execute(SQLInterpreter.scala:150) org.apache.livy.repl.Session.$anonfun$executeCode$1(Session.scala:814) scala.Option.map(Option.scala:230) org.apache.livy.repl.Session.executeCode(Session.scala:811) org.apache.livy.repl.Session.$anonfun$execute$9(Session.scala:536) org.apache.livy.repl.Session.withRealtimeOutputSupport(Session.scala:1038) org.apache.livy.repl.Session.$anonfun$execute$2(Session.scala:536) scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) scala.util.Success.$anonfun$map$1(Try.scala:255) scala.util.Success.map(Try.scala:213) scala.concurrent.Future.$anonfun$map$1(Future.scala:292) scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:750)

Screenshot 2024-06-04 113610

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,481 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,988 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,215 questions
{count} votes

Accepted answer
  1. Amira Bedhiafi 26,261 Reputation points
    2024-06-04T08:22:47.0266667+00:00

    When I check the documentation :

    You can define a table as having the IDENTITY property when you first create the table by using syntax that is similar to the following statement:

    CREATE TABLE dbo.T1
    (    C1 INT IDENTITY(1,1) NOT NULL
    ,    C2 INT NULL
    )
    WITH
    (   DISTRIBUTION = HASH(C2)
    ,   CLUSTERED COLUMNSTORE INDEX
    )
    ;
    

    Update :

    After verifying I think GENERATED ALWAYS AS IDENTITY syntax is not supported directly in Delta Lake

    Instead, you can use a workaround to simulate an auto-incrementing column.

    • Create the table without the identity column.
    %%sql
    CREATE TABLE Staging.DimContract_NAC_Test
    (
        IdentCode INT,
        ContractId INT,
        SourceUpdatedDate TIMESTAMP,
        EffectiveStartDate DATE,
        EffectiveEndDate DATE,
        ETLId INT,
        SourceId INT,
        IsIdentCodeDuplicate BOOLEAN
    )
    USING DELTA;
    
    • Insert data into the table while generating an auto-incrementing column using Spark.
    from pyspark.sql import SparkSession
    from pyspark.sql.functions import monotonically_increasing_id
    
    # Initialize Spark Session
    spark = SparkSession.builder.appName("AutoIncrementColumn").getOrCreate()
    
    # Load your data into a DataFrame
    data = [
        (123, 1, '2023-06-01 12:00:00', '2023-06-01', '2024-06-01', 1001, 2001, True),
        (124, 2, '2023-06-02 13:00:00', '2023-06-02', '2024-06-02', 1002, 2002, False),
        # Add more rows as needed
    ]
    
    columns = ["IdentCode", "ContractId", "SourceUpdatedDate", "EffectiveStartDate", "EffectiveEndDate", "ETLId", "SourceId", "IsIdentCodeDuplicate"]
    df = spark.createDataFrame(data, columns)
    
    # Add an auto-incrementing column
    df_with_id = df.withColumn("DimContractKey", monotonically_increasing_id())
    
    # Reorder columns if necessary
    df_with_id = df_with_id.select("DimContractKey", "IdentCode", "ContractId", "SourceUpdatedDate", "EffectiveStartDate", "EffectiveEndDate", "ETLId", "SourceId", "IsIdentCodeDuplicate")
    
    # Write the DataFrame to the Delta table
    df_with_id.write.format("delta").mode("append").saveAsTable("Staging.DimContract_NAC_Test")
    
    
    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.