question

ChristopherMhl-1161 avatar image
0 Votes"
ChristopherMhl-1161 asked MartinJaffer-MSFT commented

Copy Activity - Error with HTTP Request to Download CSV mit gzip


Hello Community,

I have a REST API call to download a CSV file in Postman. (see screenshot).

200442-adf-1.png

I would like to build this request in Azure Data Factory to automatically store this CSV in Azure Data Lake Storage.

In the pipeline, I first execute a few HTTP requests to authenticate myself and prepare for the download.
Then I want to use a Copy Activity to save the CSV to the Azure Data Lake Storage via the REST API call.
The REST API call does not need any additional headers or values in the body.

After a successful request in Postman, I get the following response headers.
Here I can see that the CSV was compressed with gzip.

200424-adf-2.png

Therefore, I have configured the following as the source of the Copy Activity:
200432-adf-3.png

However, with this configuration I get the following error message:

ErrorCode=InvalidDataFormat,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The source data has an invalid format. Cannot decompress the source data. Source file name: '../../../xxxx'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=ICSharpCode.SharpZipLib.GZip.GZipException,Message=Error GZIP header, first magic byte doesn't match,Source=ICSharpCode.SharpZipLib,'

What could be the reason for this? Where is the difference?



azure-data-factory
adf-1.png (70.8 KiB)
adf-2.png (62.2 KiB)
adf-3.png (43.4 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

MartinJaffer-MSFT avatar image
1 Vote"
MartinJaffer-MSFT answered MartinJaffer-MSFT commented

Hello @ChristopherMhl-1161 and welcome to Microsoft Q&A.

As I understand, you are either having trouble in forming a request, or trouble in decompressing / unzipping the result of said request. You are using HTTP>DelimitedText on the Source side and Blob>DelimitedText on the sink side.

Might I suggest an experiment? Break this up into 2 steps. Instead, of downloading as Delimited Text, try as Binary. Then in another operation, unzip and store as CSV.

The point of this being to determine:
1. Is the result an actual valid file
2. Is there a problem in the compression

There were a few possibilities that crossed my mind.
- If you REST doesn't need any additional headers, then how is the request correlated with your login? Maybe you got back an error, and that couldn't be unzipped.
- Does the call return both a body and attatchment? Are we certain which is being saved?

The magic byte is part of the GZip protocol which identifies the type of compression. If the magic byte is broken, then check for corruption, check for which type of compression, check that it is compressed at all.

· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @MartinJaffer-MSFT

thank you very much for your answer.

That is a very good idea to split the task into 2 steps and set binary as source and target format.

Even with the binary format the copy activity is still not successful. So it is probably due to the authentication.

I would like to give you a short overview of the pipeline and answer your questions.
Unfortunately, the API only works via cookie authentication, so I had to build the pipeline as follows:

201072-adf-4.png

  1. HTTP request for authentication. Here I get the cookie / token that I also use in the other HTTP requests.

  2. I request the download for certain data.
    3) I check if the download is ready.
    4) It is the same request like in the Copy Activity source. (I will upload a screenshot from the output.)

  3. This is the try to download the CSV or compressed file to ADLS2.

This is the output from step 4:
(The Response tag is much larger, I just shortened it for readability).

200950-adf-5.png


0 Votes 0 ·
adf-4.png (16.0 KiB)
adf-5.png (96.9 KiB)

Authentication:
In the HTTP Requests, I include the header as follows:

201085-adf-8.png

201017-adf-6.png

In the copy activity it looks like this:

201055-adf-9.png

201016-adf-7.png


Am I doing something wrong when passing the cookie in the Copy Activity?

201074-adf-10.png

I have tested the connection with the ADLS Linked Service before and it was successful.

Many thanks in advance!


0 Votes 0 ·
adf-8.png (6.2 KiB)
adf-6.png (10.9 KiB)
adf-9.png (14.6 KiB)
adf-7.png (19.3 KiB)
adf-10.png (36.4 KiB)

Hello @MartinJaffer-MSFT ,

did you write a reply yesterday? I got an email but do not see a new message / reply.

0 Votes 0 ·
Show more comments