Using Delta Tables in Azure Synapse - use across Notebooks and Data Flows

Stephen Connell 21 Reputation points
2022-11-11T16:50:53.787+00:00

Hi I am experiencing some issues with Delta table in Azure Synapse Workspace. I have worked with Notebooks to create Delta Tables which I then try to use in Data Flow transformations and I have used Data Flows to create Delta Table and then tried to modify or describe these tables in Notebooks. Under both of these scenarios it seems that the two approaches are not compatible with one another.

When I create a Delta Table in a Notebook it causes the following issues within Data Flows.
I have tried this with both Spark 3.2 Delta 1.2 Apache Pool and Spark 3.1. Delta 1.0.

1) SparkSQL Create Table method.

CREATE TABLE [DATABASE].[TABLE] (  
ID INT NOT NULL  
…  
)  
USING DELTA;  

This can be read in a Source using Inline Delta :

259626-image.png

When this statement was run using the Spark 3.2 Delta 1.2 Apache Pool attempting to write to the table generates the error.

Delta protocol version is too new for this version of the Databricks Runtime. Please upgrade to a newer release..   

Using the prior library for the Table create statement does allow this to work.

The adding of a check constraint to the table prevents the writes into the table. This appears to change the minWriteVersion from 2 to 3 in the log.

ALTER TABLE [DATABASE].[TABLE] ADD CONSTRAINT [CONSTRAINTNAME] CHECK ([CONDITION]);   

We then get the same error as we previously had for the CREATE on the Pool with Delta 1.2 Library.

2) Pyspark DataFrame.write.format(delta) method.

e.g.

df.write.format("delta").save(delta_table_path)  

When written with the most recent Delta Library (1.2) we get the above failures but with the older library 1.0 this method of creating tables is OK.

I have experience of a bug with the earlier library which renders any transaction prior to a checkpoint invalid for time travel so it is not a good solution for creating Delta Tables.

Conversely, if I write using a Data Flow to create a Delta Table and then attempt to work with this in a notebook I get errors.

%%sql  
CREATE TABLE IF NOT EXISTS GreenTaxi.Trips  
USING DELTA   
LOCATION 'taxi/delta/Green/Simple/';  

I get an error:

Error: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:null)

If I try to define the table into Spark say to read the history:

from delta.tables import *
from pyspark.sql.functions import *
delta_table = DeltaTable.forPath(spark, "taxi/delta/Green/Simple/")
delta_table.history().show(20, 1000, False)

I get the message:

AnalysisException: taxi/delta/Green/Simple/ is not a Delta table.

I have a couple of questions

  1. Is there some mechanism to set the library used within an Azure Integration Runtime to ensure it is compatible with the version of Delta being used by Spark Pools?
  2. Are there any suggestions about the best way to use Delta within Synapse and best practice?
  3. What are the plans for upgrading the Delta library for Azure Integration Runtimes?

Happy to add more detail if required.
Kind regards.
Stephen.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Stephen Connell 21 Reputation points
    2023-10-09T09:32:26.6166667+00:00

    Hi I thought that I would follow up.
    Mapping Data flow Inline Source and Sink for Delta now read and write for Delta tables which have

    Key Value
    delta.minReaderVersion 1
    delta.minWriterVersion 3

    This was not true mid-August so great to see that resolved finally.
    This means Delta with Table Constraints.

    Operation on target Data flow1 failed: {"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'sink1': Cannot write to table with delta.enableChangeDataFeed set. Change data feed from Delta is not yet available.","Details":"org.apache.spark.sql.AnalysisException: Cannot write to table with delta.enableChangeDataFeed set. Change data feed from Delta is not yet available.\n\tat org.apache.spark.sql.delta.DeltaErrors$.cdcWriteNotAllowedInThisVersion(DeltaErrors.scala:407)\n\tat org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:156)\n\tat org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:150)\n\tat org.apache.spark.sql.delta.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:84)\n\tat org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:143)\n\tat org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:142)\n\tat org.apache.spark.sql.delta.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:84)\n\tat org.apache.spark.sql.delta.commands.WriteIntoDelta.write(WriteIntoDelta.scala:107)\n\tat org.apache.spark.sql.delta.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:66)\n\tat org.apac"}

    In further testing, I see that enabling CDC on a table which raises the minWriterVersion to 4 does not yet work. However there is definite progress.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.