Azure Data Factory copy compressed data from FTP fails with a Stream Too long error.

Isaque CAMPELLO 11 Reputation points
2022-02-07T15:35:41.077+00:00

I have several pipelines that copy and uncompress data from an FTP server to an Azure container, however one of the activity fails with the following Error Details:

Error code: 2200
Failure Type: User configuration issue
Details: ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Upload file failed at path container\folder_1.ZIP/folder_2.zip/file.xml.,Source=mscorlib,''Type=System.Reflection.TargetInvocationException,Message=Exception has been thrown by the target of an invocation.,Source=mscorlib,''Type=System.IO.IOException,Message=Stream was too long.,Source=mscorlib,'

The data being ingested are .zip folders, usually with other compressed folders inside.
The Copy activity has Recursively enabled, the source has Compression Type as ZipDeflate (.zip) and the sink is an Azure container. The compressed file is just over 2GB, but some of the other files we have transfered were much larger.
The specifc file which is named in the error is also very small, less than 6Kb, and it does show up in the container as well. The Details of the pipeline run show that the Data read was 2,001 GB, but data written was only 5,074 MB. I tried it a couple of times, the time to failure varied from over 5 hours to just over 2 hours.

The issue doesn't seem to be the files, I was able to download the folder directly to my pc, uncompress it and open up the files without any issue.

I have tried to set the Block Size to 30Mb, and the copy behaviour to "flatten hierarchy" but to no avail. The only difference between this folder and the other ones we have ingested is that this one has 4 inner folders, more than the rest, but it isn't larger (neither compressed nor uncompressed).

Any ideias in how to solve this or what might be causing it would be highly appreciated.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,755 questions
{count} vote