Hello Gracia Espelt, Pol,
It seems you are encountering an error when executing a notebook on a shared access cluster that was running smoothly on a single-user access cluster. The error occurs during the .toPandas()
operation, suggesting that something may have changed in the runtime.
I suspect the shared access cluster may have different configuration settings than the single-user access cluster, even if the performance settings are identical.
Can you verify that the Arrow configuration is still enabled on the shared access cluster?
spark.conf.get("spark.sql.execution.arrow.pyspark.enabled")
# Enable Arrow-based columnar data transfers spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
Also, please check for any differences in the environment variables or dependencies between the two clusters.
https://learn.microsoft.com/en-us/azure/databricks/pandas/pyspark-pandas-conversion
import os
print(os.environ)
!pip freeze
A similar error was discussed on the below thread:
I hope this helps. Please let me know if you have any further questions.