Copy Activity allows HTTP Post to get Excel but when debugging it fails

Steve Dunker 40 Reputation points
2024-03-18T22:51:22.9666667+00:00

So I am trying to import an excel file from a website that uses a HTTP Post to pull the file. I have configured the linked service, dataset and then the Copy Activity. In the copy activity I am able to specifiy it use a Post method. When I click Preview Data I am able to see the data without a problem. I then import the schemas from the source and sink. I map just a single field to minimize everything. When I debug the pipeline I am getting the following error.

ErrorCode=ExcelUnsupportedFormat,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Only '.xls' and '.xlsx' format is supported in reading excel file while error is '   at ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputBuffer.ReadLeByte()
   at ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputBuffer.ReadLeInt()
   at ICSharpCode.SharpZipLib.Zip.ZipInputStream.GetNextEntry()
   at NPOI.OpenXml4Net.Util.ZipInputStreamZipEntrySource..ctor(ZipInputStream inp)
   at NPOI.OpenXml4Net.OPC.ZipPackage..ctor(Stream filestream, PackageAccess access)
   at NPOI.OpenXml4Net.OPC.OPCPackage.Open(Stream in1)
   at NPOI.Util.PackageHelper.Open(Stream is1)
   at NPOI.XSSF.UserModel.XSSFWorkbook..ctor(Stream is1)
   at Microsoft.DataTransfer.ClientLibrary.ExcelUtility.GetExcelWorkbook(String fileExtension, TransferStream stream)'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=ICSharpCode.SharpZipLib.Zip.ZipException,Message=EOF in header,Source=ICSharpCode.SharpZipLib,'

The error states it is not an Excel file but when I click their link directly it opens in Excel after downloading. The Preview data appears to work since I can see the file.

Any ideas on how to fix this because this could be a big time saver for me to process files every day.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,548 questions
{count} votes

Accepted answer
  1. phemanth 5,735 Reputation points Microsoft Vendor
    2024-03-26T06:05:04.4366667+00:00

    @Steve Dunker Here are a few approaches to address this challenge and ensure your ADF pipeline waits for the download completion before proceeding:

    1. Azure Function Monitoring with Application Insights:

    • Integration: Enable Application Insights in your Azure Function. This built-in monitoring tool provides detailed logs about function executions.
    • Function Completion Logs: Azure Functions logs information about function start and completion at the "Information" level. You can configure ADF to wait until it sees the "Function completed" log in Application Insights.
    • Implementation: Utilize the Azure Monitor data factory activity within your ADF pipeline. Configure it to query Application Insights for the specific "Function completed" log message related to your download function. Once the query finds the log, the ADF pipeline will proceed.

    2. Azure Blob Storage Monitoring with Event Grid:

    • Concept: Leverage Azure Event Grid, a serverless event routing service. Configure your Azure Blob Storage account to publish an event to Event Grid whenever a new blob (downloaded file) is uploaded.
    • ADF Trigger: Set up an ADF pipeline trigger based on the Event Grid event. This ensures the pipeline starts only when a new blob (downloaded file) arrives in storage.
    • Drawback: This approach doesn't confirm successful download by the function, only successful upload to storage. Consider additional logging within the function for better tracking.

    3. Polling with Logic Apps (For Power Automate alternative):

    • Concept: If you're considering switching away from Power Automate due to its asynchronous nature, explore Azure Logic Apps. They offer built-in waiting capabilities.
    • Implementation: Create a Logic App workflow that triggers the download function. Within the Logic App, use a "Until" loop configured to poll a specific Azure Function endpoint (returning completion status) until it receives a success response. Once successful, the Logic App can then trigger your ADF pipeline.

1 additional answer

Sort by: Most helpful
  1. Steve Dunker 40 Reputation points
    2024-03-27T15:53:06.4266667+00:00

    Thanks for the options. Was able to get my PowerAutomate solution to work by adding a management table that I monitor with an Until loop. Not as elegant as it could be and doesn't allow for parallelization. Given my current need that works, however I do think the long term solution is using Durable function.

    0 comments No comments