Copy Activity - Error with HTTP Request to Download CSV mit gzip

Question

Hello Community,

I have a REST API call to download a CSV file in Postman. (see screenshot).

I would like to build this request in Azure Data Factory to automatically store this CSV in Azure Data Lake Storage.

In the pipeline, I first execute a few HTTP requests to authenticate myself and prepare for the download.
Then I want to use a Copy Activity to save the CSV to the Azure Data Lake Storage via the REST API call.
The REST API call does not need any additional headers or values in the body.

After a successful request in Postman, I get the following response headers.
Here I can see that the CSV was compressed with gzip.

Therefore, I have configured the following as the source of the Copy Activity:

However, with this configuration I get the following error message:

ErrorCode=InvalidDataFormat,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The source data has an invalid format. Cannot decompress the source data. Source file name: '../../../xxxx'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=ICSharpCode.SharpZipLib.GZip.GZipException,Message=Error GZIP header, first magic byte doesn't match,Source=ICSharpCode.SharpZipLib,'

What could be the reason for this? Where is the difference?

Accepted Answer

Hello @Christopher Mühl and welcome to Microsoft Q&A.

As I understand, you are either having trouble in forming a request, or trouble in decompressing / unzipping the result of said request. You are using HTTP>DelimitedText on the Source side and Blob>DelimitedText on the sink side.

Might I suggest an experiment? Break this up into 2 steps. Instead, of downloading as Delimited Text, try as Binary. Then in another operation, unzip and store as CSV.

The point of this being to determine:

Is the result an actual valid file
Is there a problem in the compression

There were a few possibilities that crossed my mind.

If you REST doesn't need any additional headers, then how is the request correlated with your login? Maybe you got back an error, and that couldn't be unzipped.
Does the call return both a body and attatchment? Are we certain which is being saved?

The magic byte is part of the GZip protocol which identifies the type of compression. If the magic byte is broken, then check for corruption, check for which type of compression, check that it is compressed at all.

Copy Activity - Error with HTTP Request to Download CSV mit gzip

0 additional answers