ADF fails to unzip files

Amit Srivastava 6 Reputation points
2020-09-15T17:22:42.207+00:00

Hi,
I have Zip files on a SFTP location. The zip files contain single CSV. The zip files are approx 80 MB in size. I need to ingest that through ADF , unzip and save the csv files on ADLS. I am using the Zip deflate option in my source dataset. Compression type on sink is set as None.

However I get the error

"errorCode": "2200", "message": "Failure happened on 'Sink' side. ErrorCode=UserErrorUnzipInvalidFile,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The file 'XXXXXX.zip' is not a valid Zip file with Deflate compression method.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.IO.InvalidDataException,Message=End of Central Directory record could not be found.,Source=Microsoft.DataTransfer.ClientLibrary,'", "failureType": "UserError", "target": "XXXXXX", "details": []

When I use a smaller version of the zip files~5KB (after reducing the no of rows in the CSV), the same settings work absolutely fine.

Assuming that may be there are some zipping issues in my source zip files, I also tried with another zip file of about 220 MB which I got from somewhere else. Again got the same error.

Anyone can help me with this ?

Thanks,
Amit

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,199 questions
{count} vote

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,061 Reputation points
    2020-10-26T21:47:27.467+00:00

    Reviewing the support case, it seems the file was corrupted or format changed as it moved thru the SFTP.

    For anyone reading this and seeking help, here are recommended steps to help locate the point-of-failure.

    Manually download and unzip / decompress the file , at each place it travels through.
    In this case, it would be download from blob and unzip, download from SFTP and unzip. If either fails, then you know the file is not in the correct format at that location. If both succeed, yet Data Factory fails, then it may be an issue with Data Factory. Please let us know if the last one is the case.