Excel format in Data Factory in Microsoft Fabric
This article outlines how to configure Excel format in the data pipeline of Data Factory in Microsoft Fabric.
Supported capabilities
Excel format is supported for the following activities and connectors as source.
Category | Connector/Activity |
---|---|
Supported connector | Amazon S3 |
Amazon S3 Compatible | |
Azure Blob Storage | |
Azure Data Lake Storage Gen1 | |
Azure Data Lake Storage Gen2 | |
Azure Files | |
File system | |
FTP | |
Google Cloud Storage | |
HTTP | |
Lakehouse Files | |
Oracle Cloud Storage | |
SFTP | |
Supported activity | Copy activity (source/-) |
Lookup activity | |
GetMetadata activity | |
Delete activity |
Excel format in copy activity
To configure Excel format, choose your connection in the source of data pipeline copy activity, and then select Excel in the drop-down list of File format. Select Settings for further configuration of this format.
Excel as source
After choosing Excel format, following properties are shown up.
Worksheet mode: Select the worksheet mode that you want to use to read Excel data. Choose Name or Index.
Name: When you choose Name, in Sheet name section, select the Excel worksheet name to read data, or select Edit to specify the worksheet name manually. If you point to a folder or multiple files, make sure this particular worksheet exists in all those files.
Index: When you choose Index, in Sheet index section, select the Excel worksheet index to read data, or select Edit to specify the worksheet name manually. The data read start from 0. If there is worksheet added or deleted from excel file, the index of existed worksheets will change automatically.
After selecting Settings in File format section, following properties are shown up in the pop-up File format settings dialog box.
Compression type: The compression codec used to read Excel files. You can choose from None, bzip2, gzip, deflate, ZipDeflate, TarGzip or tar type in the drop-down list.
Compression level: Specify the compression ratio when you select a compression type. You can choose from Optimal or Fastest.
- Fastest: The compression operation should complete as quickly as possible, even if the resulting file is not optimally compressed.
- Optimal: The compression operation should be optimally compressed, even if the operation takes a longer time to complete. For more information, see Compression Level topic.
Range: The cell range in the given worksheet to locate the selective data, e.g.:
- Not specified: reads the whole worksheet as a table from the first non-empty row and column.
A3
: reads a table starting from the given cell, dynamically detects all the rows below and all the columns to the right.A3:H5
: reads this fixed range as a table.A3:A3
: reads this single cell.
Null value: Specifies the string representation of null value. The default value is empty string.
First row as header: Specifies whether to treat the first row in the given worksheet/range as a header line with names of columns. Unselected by default.
Table summary
Excel as source
The following properties are supported in the copy activity Source section when using Excel format.
Name | Description | Value | Required | JSON script property |
---|---|---|---|---|
File format | The file format that you want to use. | Excel | Yes | type (under datasetSettings ):Excel |
Worksheet mode | The worksheet mode that you want to use to read Excel data. | - Name - Index |
Yes | - sheetName - sheetIndex |
Compression type | The compression codec used to read Excel files. | Choose from: None bzip2 gzip deflate ZipDeflate TarGzip tar |
No | type (under compression ): bzip2 gzip deflate ZipDeflate TarGzip tar |
Compression level | The compression ratio. Allowed values are Optimal or Fastest. | Optimal or Fastest | No | level (under compression ): Fastest Optimal |
Range | The cell range in the given worksheet to locate the selective data. | <your cell range> | No | range |
Null value | The string representation of null value. | <your null value> empty string (by default) |
No | nullValue |
First row as header | Specifies whether to treat the first row in the given worksheet/range as a header line with names of columns. | Selected or unselected | No | firstRowAsHeader: true or false (default) |