Copy Activity - Error with HTTP Request to Download CSV mit gzip

Christopher Mühl 106 Reputation points
2022-05-09T20:34:52.377+00:00

Hello Community,

I have a REST API call to download a CSV file in Postman. (see screenshot).

200442-adf-1.png

I would like to build this request in Azure Data Factory to automatically store this CSV in Azure Data Lake Storage.

In the pipeline, I first execute a few HTTP requests to authenticate myself and prepare for the download.
Then I want to use a Copy Activity to save the CSV to the Azure Data Lake Storage via the REST API call.
The REST API call does not need any additional headers or values in the body.

After a successful request in Postman, I get the following response headers.
Here I can see that the CSV was compressed with gzip.

200424-adf-2.png

Therefore, I have configured the following as the source of the Copy Activity:
200432-adf-3.png

However, with this configuration I get the following error message:

ErrorCode=InvalidDataFormat,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The source data has an invalid format. Cannot decompress the source data. Source file name: '../../../xxxx'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=ICSharpCode.SharpZipLib.GZip.GZipException,Message=Error GZIP header, first magic byte doesn't match,Source=ICSharpCode.SharpZipLib,'

What could be the reason for this? Where is the difference?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,600 questions
0 comments No comments
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,031 Reputation points
    2022-05-10T20:51:33.27+00:00

    Hello @Christopher Mühl and welcome to Microsoft Q&A.

    As I understand, you are either having trouble in forming a request, or trouble in decompressing / unzipping the result of said request. You are using HTTP>DelimitedText on the Source side and Blob>DelimitedText on the sink side.

    Might I suggest an experiment? Break this up into 2 steps. Instead, of downloading as Delimited Text, try as Binary. Then in another operation, unzip and store as CSV.

    The point of this being to determine:

    1. Is the result an actual valid file
    2. Is there a problem in the compression

    There were a few possibilities that crossed my mind.

    • If you REST doesn't need any additional headers, then how is the request correlated with your login? Maybe you got back an error, and that couldn't be unzipped.
    • Does the call return both a body and attatchment? Are we certain which is being saved?

    The magic byte is part of the GZip protocol which identifies the type of compression. If the magic byte is broken, then check for corruption, check for which type of compression, check that it is compressed at all.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful