Deadlock issue while running a notebook on synapse

Pablo Chinchilla Valverde 26 Reputation points
2022-11-01T21:14:22.86+00:00

Hi, recently I have faced this issue while working with Synapse. Basically, I'm getting distinct values from a column from different FACT tables that I had. Then, I union them and apply another distinct just to have the unique values from all of the FACT tables. After that, I write them as a delta file in a specific path.

Now, sometimes this takes 6-8 minutes as expected since the volume of data. But sometimes, as you can see in the photo, there is some kind of problem with job assign process. After a job got executed, it takes a while for next one to start. You can notice that comparing the total run time vs total job execution time.

I'm using full pyspark code to achieve this, and applying this custom confs

spark.conf.set("spark.sql.sources.partitionOverwriteMode", "DYNAMIC")
spark.conf.set("spark.databricks.io.cache.enabled", True)
spark.conf.set("spark.databricks.adaptive.autoOptimizeShuffle.enabled", True)
spark.conf.set("spark.sql.adaptive.coalescePartitions.enabled", True)
spark.conf.set("spark.sql.adaptive.enabled", True)
spark.conf.set("spark.sql.adaptive.localShuffleReader.enabled", True)
spark.conf.set("spark.sql.adaptive.skewJoin.enabled", True)
spark.conf.set("spark.sql.hive.manageFilesourcePartitions", True)
spark.conf.set("spark.sql.hive.metastorePartitionPruning", True)
spark.conf.set("spark.sql.optimizer.dynamicPartitionPruning.enabled", True)
spark.conf.set("spark.sql.parquet.filterPushdown", True)
spark.conf.set("spark.sql.join.preferSortMergeJoin", True)
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1)
spark.conf.set("spark.sql.broadcastTimeout", "3600000ms")
spark.conf.set("spark.sql.debug.maxToStringFields", 1000)
spark.conf.set("spark.sql.shuffle.partitions", 10000)
spark.conf.set("spark.sql.hive.filesourcePartitionFileCacheSize", 786432000)

256215-image.png

I already made a research about this issue, but didn't find anything helpful. So, if anyone can help me understand further what it's happening, it will great full!

.NET
.NET
Microsoft Technologies based on the .NET software framework.
3,356 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,357 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vahid Ghafarpour 17,870 Reputation points
    2023-08-03T17:30:36.5333333+00:00

    The configuration spark.sql.optimizer.dynamicPartitionPruning.enabled enables dynamic partition pruning, which can significantly improve query performance. However, ensure that your tables are partitioned correctly and the partition columns are used in the query predicates for effective pruning.

    0 comments No comments