Azure Synapse Apache Spark Pools seemingly broken

Homero Rivera 0 Reputation points
2025-01-15T21:30:49.82+00:00

Apache Spark and Delta Lake are 2 well-established technologies that you can either implement in a private cluster or use out of the box as part of a larger platform such as Databricks or Synapse.

Typically, you want to use a managed platform to keep yourself from misconfiguring Apache Spark, Delta Lake, or their underpinnings. However, lately, we’ve had a bad experience using Azure Synapse when using Apache Spark, and we just can’t figure it out.

Everything was working fine until a couple weeks back; We coded data transformations in Synapse Notebooks mostly in Python with Spark modules and methods. Those are meant to write data into Delta tables stored in just your standard Datalake Gen2 resource in Azure.

 

One day, I ran one of those Synapse Notebooks like every other day; I was expecting a few inserts into a delta table, a task that usually takes a few seconds… Only this time, it was taking longer, and longer… 10, 20, then 30 minutes passed, and the statement wouldn’t finish. There was obviously something wrong.

I looked at the path for this delta table in Datalake Gen2 Storage. Contents seemed normal.

Then I thought: Why not try a simple select statement? So, I clicked the “Run cell” button and then again, the query would go on for 10, 20, 30 minutes before I had to cancel.

Now, this is a table with less than a hundred rows, and the insert statement was supposed to add like five! No way this could be due to table size.

 

So, that was using the Synapse Notebooks with Apache Spark Pools… I was curious to see if the same would happen using the Data section in Synapse; Surprisingly, I got the data! No latency or anything.

Then I went to the Linked section, browsed the directory containing the delta data selected “Delta format” from the dropdown menu: Again, I got the data!

 

There was no doubt: The Apache Spark Pools somehow got broken. I even created another Apache Spark Pool, and that didn’t work either.

As a last resource, I tried changing the firewall rules of the Datalake Gen2 Storage. Nothing changed things, didn’t get any errors, didn’t get a 403 after blocking all access.

I also compared the logs from before we had this problem, and then after it happened. I couldn’t spot any difference, nothing that really tells there’s a problem.

 

Hopefully this will reach someone with similar past experience, or a sme in Synapse that could possibly explain what’s going on and help us fix it.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 15,755 Reputation points Microsoft External Staff Moderator
    2025-01-16T06:23:00.2066667+00:00

    @Homero Rivera

    Thanks for reaching out to Microsoft Q&A

    It seems like you're experiencing a issue with Azure Synapse Apache Spark Pools. Let's try to troubleshoot this together.

    Here are a few steps you can take to diagnose, please confirm us after performing these checks. so that we can rule out these possibilities.

    1. Check Spark Pool Configuration: Ensure that the Spark pool configuration hasn't changed. Verify the number of nodes, node size, and auto-scaling settings. Screenshot that apache spark configuration.
    2. Resource Utilization: Monitor the resource utilization of your Spark pool. High CPU or memory usage might indicate that the pool is under heavy load, causing delays. Screenshot that create configuration in configure session.
    3. Cluster Logs: Examine the cluster logs for any errors or warnings that might provide clues. Look for any changes in the logs compared to when the system was functioning correctly.
    4. Network Latency: Check for any network latency issues between Synapse and your Data Lake Gen2 storage. Network issues can cause significant delays in data processing.
    5. Spark Version: Ensure that the Spark version being used is compatible with your code and Delta Lake version. Sometimes, updates or changes in versions can cause unexpected behavior.
    6. Data Skew: Investigate if there's any data skew in your Delta tables. Uneven distribution of data can lead to performance bottlenecks.
    7. Query Optimization: Review your queries and transformations for any potential optimizations. Sometimes, small changes in the code can lead to significant performance improvements.

    Hope this helps. Please Do let us know if you any further queries.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.