My python UDF is really buggy, after the runtime change from 16.3 to 16.4 LTS on Azure databricks

Question

My python UDF is really buggy, after the runtime change from 16.3 to 16.4 LTS on Azure databricks

Tian Chen 0

I have some job compute setup earlier this year. And a week ago all the python UDF that i have in the notebook has been really buggy. Sometime it will get stuck during execution. And the execute time have been go all over the places the same code sometime take 5s or 15 minutes to execute. When i run the same notebook on runtime 16.3, It's all good. But when we are running on 16.4 LTS it starting have have this issue.

Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2025-06-04T01:06:34.1366667+00:00

@Tian Chen Just checking in to see if the below answer provided by @Amira Bedhiafi helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2025-06-09T04:49:29.3233333+00:00

@Tian Chen We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

1 answer

Your answer

Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2025-06-04T01:06:34.1366667+00:00

@Tian Chen Just checking in to see if the below answer provided by @Amira Bedhiafi helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2025-06-09T04:49:29.3233333+00:00

@Tian Chen We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

Amira Bedhiafi 33,071 Volunteer Moderator

Hello Tian,

Thank you for posting on Microsoft Learn.

Your issue is either related to some internal changes in how UDFs are executed or how the new runtime interacts with Python dependencies or Spark execution plans.

Go to the Databricks Runtime 16.4 LTS Release Notes and look for:

Python UDF execution changes

Python version upgrades (e.g., 3.10.x to 3.11.x)

Internal API or serialization differences

Behavior changes in Arrow or Pandas UDFs

You can modify your UDF to include logging or print statements to identify whether the slowness is inside the UDF or in how Spark schedules tasks :

def my_udf(x):
    import time
    start = time.time()
    result = ... 
    print(f"Processed {x} in {time.time() - start:.2f}s")
    return result

Some UDFs may lazily import modules inside them. This becomes problematic with updated runtimes so you may need to move imports outside the UDF:

import json
def my_udf(x):
    return json.loads(x)

If you're using @udf or F.udf(...), try re-registering it with explicit types:

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
@udf(returnType=StringType())
def my_udf(x):
    ...

Another thing, If you are using Pandas UDFs, try disabling Arrow temporarily.

If this fixes the issue, it's likely due to Arrow or PyArrow version mismatch or serialization bugs in the new runtime.

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false")

Tian Chen 0

So here is what my UDF looks like. I tried to do the above but it is still not working right for me.

@pandas_udf("double", PandasUDFType.GROUPED_AGG)
def lap_length_udf(
    xs: pd.Series,
    ys: pd.Series
) -> np.float64:
    
    # Compute the diff
    dx = np.diff(xs)
    dy = np.diff(ys)

    # Compute the length
    return np.hypot(dx,dy).sum()

Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2025-06-05T01:13:16.9133333+00:00
Tian Chen Given that the issue persists even after following the general suggestions, and you're specifically using GROUPED_AGG with Pandas UDFs, there are a few targeted things you might want to try: Suggestions based on your scenario:

Convert Series to NumPy Arrays Explicitly: There can be subtle behavior changes with .diff() when operating on a Pandas Series directly. Try:
dx = np.diff(xs.values) dy = np.diff(ys.values)

Disable Arrow for Isolation: As a diagnostic step, set:
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false")
This helps rule out serialization issues introduced with Arrow/PyArrow in Runtime 16.4.

Add Timing Logs to Spot Bottlenecks: Temporarily add a timer inside your UDF to understand if delays are within the function or due to scheduling:
import time @pandas_udf("double", PandasUDFType.GROUPED_AGG) def lap_length_udf(xs: pd.Series, ys: pd.Series) -> np.float64: start = time.time() dx = np.diff(xs.values) dy = np.diff(ys.values) result = np.hypot(dx, dy).sum() print(f"Processed group in {time.time() - start:.2f}s") return result

Check Runtime Changes in Dependencies: Databricks Runtime 16.4 may have updated versions of Pandas, NumPy, or PyArrow. It’s worth checking release notes or pip list to compare version differences between 16.3 and 16.4.

If the Issue Persists:

If disabling Arrow and refining the UDF still doesn’t help, and 16.3 continues to run fine, this may point to a regression or change in 16.4. In that case:

It’s best to log a support case with Databricks for deeper investigation.

Until then, if stability is critical, consider pinning your jobs to Runtime 16.3 as a temporary workaround.

Hope this helps. Do let us know if you any further queries.
Tian Chen 0 Reputation points

2025-06-05T13:19:41.5766667+00:00

So when i try to disable the pyarrow. Here is what it shows.

[CONFIG_NOT_AVAILABLE] Configuration spark.sql.execution.arrow.pyspark.enabled is not available. SQLSTATE: 42K0I
Tian Chen 0 Reputation points

2025-06-05T13:32:13.57+00:00

So when i try to disable the pyarrow. Here is what it shows.

[CONFIG_NOT_AVAILABLE] Configuration spark.sql.execution.arrow.pyspark.enabled is not available. SQLSTATE: 42K0I
Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2025-06-05T14:29:44.74+00:00
Tian Chen Thanks for the update and appreciate your detailed troubleshooting!

The error [CONFIG_NOT_AVAILABLE] suggests that spark.sql.execution.arrow.pyspark.enabled might not be supported or exposed in the specific execution environment you're using (such as Azure Synapse Spark or certain managed clusters). This setting is generally applicable in standard Apache Spark environments and Databricks notebooks, so the availability might vary.

A few next steps:

Check Runtime Environment: Can you confirm if you're executing the job in a Databricks interactive notebook or a job cluster? This helps determine if the config scope applies.

Try setting at cluster level (if possible): If you're using a notebook attached to a cluster, try setting it via notebook-scoped config instead of global:
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false")

Alternative Diagnostic (without disabling Arrow): Since Arrow might not be configurable in your setup, continue focusing on isolating the UDF logic. You’ve already done great by using .values with np.diff. Also, verify the exact versions of numpy, pandas, and pyarrow in runtime 16.3 vs. 16.4 using:
%pip show pandas numpy pyarrow
Report Regression if Needed: If the issue persists only in 16.4 and works perfectly on 16.3 even after optimizing the UDF, this could be a runtime-level regression. You may want to raise a support case with Databricks for a more in-depth investigation.

Let us know if you find anything further, and we’ll be glad to help!
Amira Bedhiafi 33,071 Reputation points Volunteer Moderator

2025-06-09T14:22:21.7066667+00:00
Hello Tian Chen !

Instead of the old config, try:

spark.conf.getAll()

Search the output for any config containing "arrow"you might see:

spark.sql.pyspark.nativeExec.enabled

spark.sql.execution.arrow.pyspark.enabled (if renamed, could be under a different namespace)

spark.sql.pyspark.arrow.*

You can also run this to filter directly :

for k, v in spark.conf.getAll().items(): if "arrow" in k.lower(): print(k, v)

Share via

My python UDF is really buggy, after the runtime change from 16.3 to 16.4 LTS on Azure databricks

1 answer

Your answer