Unable to create a dataframe from an excel file in the Azure databricks (UC enabled workspace)

Vishal D 5 Reputation points
2025-04-11T13:14:01.3566667+00:00

Hello,

After adding the maven library com.crealytics:spark-excel_2.12:0.13.5 under the artifact allowlist, I have installed it at the Azure databricks cluster level (shared, unity catalog enabled, runtime 15.4). Then I tried to create a df for the excel file as follows,

df = spark.read.format('com.crealytics.spark.excel').option('header',True).load('abfss://******@dlake.dfs.core.windows.net/myfiles/file1.xlsx')

And while trying to display the above df, or trying to see the columns using df.columns, I got the following error,

[INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission SELECT on any file. SQLSTATE: 42501 File <command-6451936430536458>, line 1----> 1 df.columns File /databricks/spark/python/pyspark/sql/connect/client/core.py:2155, in SparkConnectClient._handle_rpc_error(self, rpc_error) 2140 raise Exception( 2141 "Python versions in the Spark Connect client and server are different. " 2142 "To execute user-defined functions, client and server should have the " (...) 2151 "https://docs.databricks.com/en/release-notes/serverless.html" target="_blank" rel="noopener noreferrer">https://docs.databricks.com/en/release-notes/serverless.html</a>.</span><span>"</span> 2152 ) 2153 # END-EDGE-> 2155 raise convert_exception( 2156 info, 2157 status.message, 2158 self._fetch_enriched_error(info), 2159 self._display_server_stack_trace(), 2160 ) from None 2162 raise SparkConnectGrpcException(status.message) from None 2163 else

**
Additional information:

  1. Since it's a Unity catalog-enabled workspace, I'm following the external location and not the traditional mount point for the ADLS gen2 where the excel files are present. In a different workspace (non-unity catalog enabled), I was able to create the df successfully for the same code, but the file location is based on the mount point path.
  2. Also tried reading excel file from Volume. Still the same error.
  3. I was able to create a df for the csv and JSON files in the same ADLS Gen2 as follows,
df = spark.read.format('csv').option('header',True).load('abfss://******@dlake.dfs.core.windows.net/bom/validity/BOM_VALIDITY.csv')

I'm not having any issues and was able to display the df. I have the following permissions over the external location,  create external table, read files and write files.

What's wrong, and kindly help me in this regard. TIA.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247 40,296 Reputation points MVP Volunteer Moderator
    2025-04-11T15:48:32.0966667+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    This is a known issue when using 3rd party libraries like com.crealytics.spark-excel in Unity Catalog-enabled workspaces on Azure Databricks. Here is a breakdown of what is going wrong and how to fix it:

    Unity Catalog enforces fine-grained access control over file operations. Third-party libraries like spark-excel do not integrate natively with Unity Catalog. So, even if:

    You have SELECT, READ FILES, and WRITE FILES on the external location,

    You can read CSV and JSON using native Spark readers,

    you will still get [INSUFFICIENT_PERMISSIONS] errors when using a non-native source like spark-excel, because it bypasses the UC governance layer, and Unity Catalog blocks it.

    Point to note:

    1. Native formats (like Parquet, CSV, JSON) are UC-aware and work as expected.
    2. Third-party formats (like Excel, Avro with custom libraries, JDBC-based readers) are not UC-aware, and they hit access control walls.
    3. Unity Catalog does not currently support direct file access via custom Spark datasources.

    Option 1: Read Excel in Driver and Parallelize

    This bypasses Unity Catalog enforcement because you do not use Spark’s distributed read directly:

    
    import pandas as pd
    
    # Local read using pandas
    local_df = pd.read_excel('/dbfs/mnt/myfiles/file1.xlsx')
    
    # Convert to Spark DataFrame
    df = spark.createDataFrame(local_df)
    df.show()
    

    Prerequisites:

    The file must be accessible under /dbfs/....

    • That means you should copy it from ADLS to DBFS first using:
        
        dbutils.fs.cp("abfss://****@dlake.dfs.core.windows.net/myfiles/file1.xlsx", "dbfs:/mnt/myfiles/file1.xlsx")
      

    Note: This reads the data on the driver, so it is not suitable for very large Excel files.


    Option 2: Use a Non Unity Catalog Cluster

    If the Excel processing is critical and large-scale, consider spinning up a standard cluster (non UC) just for this step. Once read, save the file as a delta or parquet and then consume it in the UC workspace.


    Best Practices

    Store Excel files as intermediate formats (Parquet, Delta) post-ingestion.

    For ingestion, use a pipeline step (like in ADF or a non-UC cluster).

    • Unity Catalog is evolving, but currently it limits non-native IO patterns.

    Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.


  2. PRADEEPCHEEKATLA 91,656 Reputation points Moderator
    2025-05-20T18:54:40.6533333+00:00

    Vishal D - Thanks for the question and using MS Q&A platform.

    This is an excepted behavior in Unity Catalogs in Azure Databricks.

    While I'm experiencing the same issue, I was able to find the couple of solutions.

    Solution: You should perform the write operation on a local disk and copy the results to a Unity Catalog volume. You can use /local_disk0/tmp for all cluster types. Other paths may fail.

    First option: You can try to use the knowledge base which databricks has provided: Trying to write Excel files to a Unity Catalog volume fails.

    Note: FYI, this works when you have a single data frame which you want to write to excel file to volumes in unity catalog.

    Second option: The shutil module offers a number of high-level operations on files and collections of files. In particular, functions are provided which support file copying and removal.

    Here I'm using excel file with multiple sheets, where I have saved the file to the workspace location (or you can use DBFS folder) then used shutil module to copy to unity catalog volumes.

    import shutil
    
    # Define the source and destination paths
    source_path = output_path
    destination_path = f'/Volumes/shared_datalake_ring{ring_number}/default/shared/Projects/xxx/xxxxxxx/Shared/InventoryTargets.xlsx'
    
    # Copy the Excel file to the Unity Catalog volume
    shutil.copy(source_path, destination_path)
    
    print(f"Excel file has been successfully copied to {destination_path}.")
    
    

    Screenshot 2025-05-20 134642

    Hope this helps.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.