Hi ,
Thanks for reaching out to Microsoft Q&A.
This is a known issue when using 3rd party libraries like com.crealytics.spark-excel in Unity Catalog-enabled workspaces on Azure Databricks. Here is a breakdown of what is going wrong and how to fix it:
Unity Catalog enforces fine-grained access control over file operations. Third-party libraries like spark-excel do not integrate natively with Unity Catalog. So, even if:
You have SELECT, READ FILES, and WRITE FILES on the external location,
You can read CSV and JSON using native Spark readers,
you will still get [INSUFFICIENT_PERMISSIONS] errors when using a non-native source like spark-excel, because it bypasses the UC governance layer, and Unity Catalog blocks it.
Point to note:
- Native formats (like Parquet, CSV, JSON) are UC-aware and work as expected.
- Third-party formats (like Excel, Avro with custom libraries, JDBC-based readers) are not UC-aware, and they hit access control walls.
- Unity Catalog does not currently support direct file access via custom Spark datasources.
Option 1: Read Excel in Driver and Parallelize
This bypasses Unity Catalog enforcement because you do not use Spark’s distributed read directly:
import pandas as pd
# Local read using pandas
local_df = pd.read_excel('/dbfs/mnt/myfiles/file1.xlsx')
# Convert to Spark DataFrame
df = spark.createDataFrame(local_df)
df.show()
Prerequisites:
The file must be accessible under /dbfs/....
- That means you should copy it from ADLS to DBFS first using:
dbutils.fs.cp("abfss://****@dlake.dfs.core.windows.net/myfiles/file1.xlsx", "dbfs:/mnt/myfiles/file1.xlsx")
Note: This reads the data on the driver, so it is not suitable for very large Excel files.
Option 2: Use a Non Unity Catalog Cluster
If the Excel processing is critical and large-scale, consider spinning up a standard cluster (non UC) just for this step. Once read, save the file as a delta or parquet and then consume it in the UC workspace.
Best Practices
Store Excel files as intermediate formats (Parquet, Delta) post-ingestion.
For ingestion, use a pipeline step (like in ADF or a non-UC cluster).
- Unity Catalog is evolving, but currently it limits non-native IO patterns.
Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.