ADF Copy Activity - Disable chunking when unchecked does not copy a xlsx file

annu.shokeen@team.telstra.com 6 Reputation points
2023-02-02T03:48:27.89+00:00

Hi Team,

The issue is while using Copy activity in ADF.

Issue occurs when disable chunking is unchecked (default value). As soon as we set disable chunking to true that file is copied successfully. The file has about 36000 records. We receive the following error if we try to copy file with chunking enabled:

"ErrorCode=ExcelUnsupportedFormat,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Only '.xls' and '.xlsx' format is supported in reading excel file while error is '   at Microsoft.DataTransfer.Common.TasksCoordinator.CheckTaskFailureOrCancellation()\r\n   at Microsoft.DataTransfer.ClientLibrary.BufferAssembler.<GetBuffers>d__26.MoveNext()\r\n   at Microsoft.DataTransfer.Runtime.MultipartFileParallelReadProcessor.<ReadBuffers>d__14.MoveNext()\r\n   at Microsoft.DataTransfer.ClientLibrary.TransferStream.ReadInternal(Byte[] buffer, Int32 offset, Int32 count)\r\n   at Microsoft.DataTransfer.ClientLibrary.TransferStream.Read(Byte[] buffer, Int32 offset, Int32 count)\r\n   at ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputBuffer.Fill()\r\n   at ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputBuffer.ReadLeByte()\r\n   at ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputBuffer.ReadLeInt()\r\n   at ICSharpCode.SharpZipLib.Zip.ZipInputStream.GetNextEntry()\r\n   at NPOI.OpenXml4Net.Util.ZipInputStreamZipEntrySource..ctor(ZipInputStream inp)\r\n   at NPOI.OpenXml4Net.OPC.ZipPackage..ctor(Stream filestream, PackageAccess access)\r\n   at NPOI.OpenXml4Net.OPC.OPCPackage.Open(Stream in1)\r\n   at NPOI.Util.PackageHelper.Open(Stream is1)\r\n   at NPOI.XSSF.UserModel.XSSFWorkbook..ctor(Stream is1)\r\n   at Microsoft.DataTransfer.ClientLibrary.ExcelUtility.GetExcelWorkbook(String fileExtension, TransferStream stream)'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error occurred when trying to fetch the SFTP file '/AdaptiveMobilityFundsReport_310120231931767.xlsx'. This could be a transient issue and you may rerun the job. If it fails again continuously, contact customer support.,Source=Microsoft.DataTransfer.ClientLibrary.SftpConnector,'

We tried many times but the file simply does not copy. It works when chunking is disabled.

Now we have another xlsx file which has about 60000 records which when we try to copy with chunking off, we get the following error:

"ErrorCode=ExcelUnsupportedFormat,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Only '.xls' and '.xlsx' format is supported in reading excel file while error is '   at Microsoft.DataTransfer.Common.TasksCoordinator.CheckTaskFailureOrCancellation()\r\n   at Microsoft.DataTransfer.ClientLibrary.BufferAssembler.<GetBuffers>d__26.MoveNext()\r\n   at Microsoft.DataTransfer.Runtime.MultipartFileParallelReadProcessor.<ReadBuffers>d__14.MoveNext()\r\n   at Microsoft.DataTransfer.ClientLibrary.TransferStream.ReadInternal(Byte[] buffer, Int32 offset, Int32 count)\r\n   at Microsoft.DataTransfer.ClientLibrary.TransferStream.Read(Byte[] buffer, Int32 offset, Int32 count)\r\n   at ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputBuffer.Fill()\r\n   at ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream.Fill()\r\n   at ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream.Read(Byte[] buffer, Int32 offset, Int32 count)\r\n   at ICSharpCode.SharpZipLib.Zip.ZipInputStream.BodyRead(Byte[] buffer, Int32 offset, Int32 count)\r\n   at NPOI.OpenXml4Net.Util.ZipInputStreamZipEntrySource.FakeZipEntry..ctor(ZipEntry entry, ZipInputStream inp)\r\n   at NPOI.OpenXml4Net.Util.ZipInputStreamZipEntrySource..ctor(ZipInputStream inp)\r\n   at NPOI.OpenXml4Net.OPC.ZipPackage..ctor(Stream filestream, PackageAccess access)\r\n   at NPOI.OpenXml4Net.OPC.OPCPackage.Open(Stream in1)\r\n   at NPOI.Util.PackageHelper.Open(Stream is1)\r\n   at NPOI.XSSF.UserModel.XSSFWorkbook..ctor(Stream is1)\r\n   at Microsoft.DataTransfer.ClientLibrary.ExcelUtility.GetExcelWorkbook(String fileExtension, TransferStream stream)'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Sftp read without chunking initialize failed, offset '4194304'. This is a permanent error, retrying will not help, please contact customer support.,Source=Microsoft.DataTransfer.ClientLibrary.SftpConnector,'"

This file is copied when we enable chunking.

But this is a problem for us, because its a simple pipeline and we want to use the same pipeline to ingest both files. Ideally, the first file should be copied irrespective of chunking as it is a smaller file. I don't understand why the first file gives error when chunking is enabled.

Thanks,
Annu

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,508 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,422 Reputation points Microsoft Employee
    2023-02-09T18:42:40.32+00:00

    Hi annu.shokeen@team.telstra.com,

    Welcome to Microsoft Q&A forum and thanks for your query.

    As per the description the issue seems very strange. Could you please confirm both are same SFTP source? And also could you please share the pipeline runID and activity ID's for the failed runs in both scenarios that way I can check with internal team to have a look at the logs.

    By default, ADF SFTP connector tries to get the file length first, then divide the file into multiple parts and read them in parallel. The disable chunking parameter specifies whether your SFTP server supports getting file length or seeking to read from a certain offset. If your SFTP server does not support chunking, then you will need to disable chunking (disableChunking = true) .
    Could you please give a try setting disableChiking= false and setting your MaxConcurrentConnections=1 and see if that helps to resolve the issue?

    User's image

    Do let us know how it goes.

    Thank you

    2 people found this answer helpful.
    0 comments No comments

  2. annu.shokeen@team.telstra.com 6 Reputation points
    2023-02-09T18:43:08.3366667+00:00

    Hi KranthiPakala,

    Thanks for your suggestions. It turned out that we did have different settings for 2 different folders which we were using in SFX. I am not sure if they were on different servers but one of them had encryption. After removing encryption on that folder, the file is being read with default settings of chunking.

    Thank you.

    1 person found this answer helpful.