Spark Job Definition: Use delta library

JLopez 61 Reputation points
2023-05-16T16:02:54.4+00:00

Hi all, I am trying to use the delta library in a PySpark Spark job definition with a 3.3 spark cluster but I cannot make it work. My code is:

spark = SparkSession.builder.appName("DataLoad") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog").getOrCreate()
import delta
deltaTable = DeltaTable.forName(spark,LoadZone+'.'+TableName)
dfLogs=deltaTable.history()

The error is:

  File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/filecache/12/delta.py", line 120, in log_execution
    deltaTable = DeltaTable.forName(spark,LoadZone+'.'+TableName)
NameError: name 'DeltaTable'

So it looks like it cannot import the library, how can I use the delta library inside my job definition?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,369 questions
{count} votes

1 answer

Sort by: Most helpful
  1. JLopez 61 Reputation points
    2023-05-23T10:32:16.36+00:00

    Hi @ShaikMaheer-MSFT

    I did a test creating a simple spark job definition configured with the library and it worked with Delta commands. Then I tried my old spark job definition configured with the library and it did not work so the issue persists even if for another definition the Delta was working. I decided to create a new spark job definition with my code and it worked so I supose the issue is in the job definition but I cannot understand why. I will do more tests.

    Thanks for your help!!

    0 comments No comments