Read and stream Excel files

Important

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.

Azure Databricks includes built-in support for reading .xls and .xlsx files, eliminating the need for external libraries or manual file conversions. You can read any sheet from a multi-sheet workbook, target specific cell ranges, automatically infer schema and data types, and work with formula values as their computed results. Excel files can be read from cloud storage or uploaded directly in the Add Data UI, and support both batch and streaming workloads using Auto Loader.

Prerequisites

Reading and streaming Excel files requires Databricks Runtime 17.1 or above and Auto Loader for streaming workloads.

Options

Use the .option() and .options() methods of DataFrameReader to configure Excel data sources. For a complete list of supported options, see DataFrameReader Excel options and DataFrameWriter Excel options.

Usage

The following examples demonstrate reading Excel files using the Spark batch (spark.read) and streaming APIs. By default, the parser reads all cells from the top-left to the bottom-right non-empty cell in the first sheet; use the dataAddress option to target a specific sheet or cell range. Schema is inferred automatically, or you can specify your own.

Create or modify a table in the UI

You can use the Create or modify table UI to create tables from Excel files. Start by uploading an Excel file or selecting an Excel file from a volume or an external location. Choose the sheet, adjust the number of header rows, and optionally specify a cell range. The UI supports creating a single table from the selected file and sheet.

Read Excel files

You can read an Excel file from cloud storage (for example, S3, ADLS) using spark.read.excel or SQL's read_files function.

Python

# Read the first sheet from a single Excel file or from multiple Excel files in a directory
df = (spark.read.excel(<path to excel directory or file>))

# Infer schema field name from the header row
df = (spark.read
       .option("headerRows", 1)
       .excel(<path to excel directory or file>))

# Read a specific sheet and range
df = (spark.read
       .option("headerRows", 1)
       .option("dataAddress", "Sheet1!A1:E10")
       .excel(<path to excel directory or file>))

SQL

-- Read an entire Excel file
CREATE TABLE my_table AS
SELECT * FROM read_files(
  "<path to excel directory or file>",
  schemaEvolutionMode => "none"
);

-- Read a specific sheet and range
CREATE TABLE my_sheet_table AS
SELECT * FROM read_files(
  "<path to excel directory or file>",
  format => "excel",
  headerRows => 1,
  dataAddress => "Sheet1!A2:D10",
  schemaEvolutionMode => "none"
);

Stream Excel files using Auto Loader

You can stream Excel files using Auto Loader by setting cloudFiles.format to excel. For example:

df = (
  spark
    .readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "excel")
    .option("cloudFiles.inferColumnTypes", True)
    .option("headerRows", 1)
    .option("cloudFiles.schemaLocation", "<path to schema location dir>")
    .option("cloudFiles.schemaEvolutionMode", "none")
    .load(<path to excel directory or file>)
)
df.writeStream
  .format("delta")
  .option("mergeSchema", "true")
  .option("checkpointLocation", "<path to checkpoint location dir>")
  .table(<table name>)

Ingest Excel files using `COPY INTO`

Use COPY INTO to load Excel files from cloud storage into a Delta table idempotently.

CREATE TABLE IF NOT EXISTS excel_demo_table;

COPY INTO excel_demo_table
FROM "<path to excel directory or file>"
FILEFORMAT = EXCEL
FORMAT_OPTIONS ('mergeSchema' = 'true')
COPY_OPTIONS ('mergeSchema' = 'true');

List sheets

You can list the sheets in an Excel file using the listSheets operation. The returned schema is a struct with the following fields:

sheetIndex: long
sheetName: String

For example:

Python

# List the name of the Sheets in an Excel file
df = (spark.read.format("excel")
       .option("operation", "listSheets")
       .load(<path to excel directory or file>))

SQL

SELECT * FROM read_files("<path to excel directory or file>",
  schemaEvolutionMode => "none",
  operation => "listSheets"
)

Parse complex non-structured Excel sheets

For complex, non-structured Excel sheets (for example, multiple tables per sheet, data islands), Databricks recommends extracting the cell ranges you need to create your Spark DataFrames using the dataAddress options.

df = (spark.read.format("excel")
       .option("headerRows", 1)
       .option("dataAddress", "Sheet1!A1:E10")
       .load(<path to excel directory or file>))

Limitations

Password-protected files are not supported.
Only one header row is supported.
Merged cell values only populate the top-left cell. Remaining child cells are set to NULL.
Streaming Excel files using Auto Loader is supported, but schema evolution is not. You must explicitly set schemaEvolutionMode="None".
"Strict Open XML Spreadsheet (Strict OOXML)" is not supported.
Macro execution in .xlsm files is not supported.
The ignoreCorruptFiles option is not supported.

FAQ

Find answers to frequently asked questions about the Excel connector in Lakeflow Connect.

Can I read all sheets at once?

The parser reads only one sheet from an Excel file at a time. By default, it reads the first sheet. You can specify a different sheet using the dataAddress option. To process multiple sheets, first retrieve the list of sheets by setting the operation option to listSheets, then iterate over the sheet names and read each one by providing its name in the dataAddress option.

Can I ingest Excel files with complex layouts or multiple tables per sheet?

By default, the parser reads all Excel cells from the top-left cell to the bottom-right non-empty cell. You can specify a different cell range using the dataAddress option.

How are formulas and merged cells handled?

Formulas are ingested as their computed values. For merged cells, only the top-left value is retained (child cells are NULL).

Can I use Excel ingestion in Auto Loader and streaming jobs?

Yes, you can stream Excel files using cloudFiles.format = "excel". However, schema evolution is not supported, so you must set "schemaEvolutionMode" to "None".

Is password-protected Excel supported?

No. If this functionality is critical to your workflows, contact your Databricks account representative.

Additional resources

Read and write CSV files: If your data source can export to CSV, CSV is a simpler format with broader tooling support and no dependency on a dedicated parser.

คำติชม

หน้านี้มีประโยชน์หรือไม่

Last updated on 2026-06-15

Read and stream Excel files

Prerequisites

Options

Usage

Create or modify a table in the UI

Read Excel files

Python

SQL

Stream Excel files using Auto Loader

Ingest Excel files using COPY INTO

List sheets

Python

SQL

Parse complex non-structured Excel sheets

Limitations

FAQ

Additional resources

คำติชม

แหล่งทรัพยากรเพิ่มเติม

Ingest Excel files using `COPY INTO`