Cannot Read XLS file Error 21352

Colinn Calaguas 45 Reputation points
2024-01-04T19:22:17.45+00:00

We are receiving 665 XLS files every month from our Client and we are trying to insert these data into our Database but sadly (I'm not yet sure) ADF cannot work on XLS (source) and DB (sink) in Copy Data activity. My work around on this is to convert the XLS to CSV but unfortunately whenever I try to read 1 file it's giving me this error.

Error 3

When I checked the error details

Error Code: 21352
 
Details: Only '.xls' and '.xlsx' format is supported in reading excel file while error is '   at NPOI.POIFS.Storage.HeaderBlock..ctor(Stream stream)
   at NPOI.POIFS.FileSystem.POIFSFileSystem..ctor(Stream stream)
   at Microsoft.DataTransfer.ClientLibrary.ExcelUtility.GetExcelWorkbook(String fileExtension, TransferStream stream)'.
Invalid header signature; read 0x4E205943494C4F50, expected 0xE11AB1A1E011CFD0 - Your file appears not to be a valid OLE2 document

But when I checked the property, it is an Excel file

Error 1

Also when I tried opening the file manually using Excel it is giving me this prompt but whenever I click YES it just opens the file with no issues

Error 2

I'm not sure what's causing this but maybe there's a workaround on this? Really appreciate the help.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,640 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 33,631 Reputation points Volunteer Moderator
    2024-01-04T20:15:03.49+00:00

    Based on this thread :

    1. Export or Convert to Flat Files: Before transferring to the cloud, convert the Excel files into more manageable formats like CSV, tab-delimited, or pipe-delimited files. This is the most straightforward method, though it requires altering your current process.
    2. Extract Data via XML Shredding: As Excel files can be treated as XML, you can create a custom task to open the Excel file in XML format and then extract the required data.
    3. Utilize SSIS Packages in Azure Data Factory: With the introduction of the "Execute SSIS package" activity in Azure Data Factory, it's now possible to use SSIS packages, which offer better Excel file support, including a Connection Manager. This could involve creating an SSIS package specifically for handling Excel files and integrating it within Azure Data Factory version 2 (ADFv2). Note, however, that this method is speculative and untested, and it would require setting up an Integration Runtime (IR) for running SSIS in ADFv2.
    4. Leverage Databricks: Consider using Databricks to read Excel files. While initiating a Spark cluster just for a few Excel files might be excessive, it's a viable option if Spark is already a part of your architecture. You can find examples of how to do this here.

    More links :

    https://learn.microsoft.com/en-us/sql/relational-databases/import-export/import-data-from-excel-to-sql?view=sql-server-2017#adf

    https://stackoverflow.com/questions/44109957/loading-data-from-excel-file-into-azure-datawarehouse

    https://learn.microsoft.com/en-us/azure/data-factory/how-to-invoke-ssis-package-ssis-activity

    https://github.com/Azure/AzureDataLake/tree/master/Samples/ExcelExtractor


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.