Data Factory Copy activity failing with Error code 2200 - Operation failed as split count exceeding upper bound of 1000000

Question

Hi ,
I am copying large data from ADLS gen 1 to ASDW(Azure synapse Analytics) using Data Factory (polybase set to yes and mentioned blob storage settings).
The Source data is *.parquet format and is partitioned. Copy activity fails. In first attempt I gave the wildcard path of all the parquet files. It threw this error: Operation failed as split count exceeding upper bound of 1000000
In the second attempt, I reduced the copy size to just one partition, I mentioned just one partition folder in ADLS - but it gave another error: " file is not a parquet file (too small)" . Please help me to resolve this. It looks like configuration issue but where should this be done? Thanks

"dataRead": 25884501402,
"filesRead": 1674684,
"sourcePeakConnections": 137,
"copyDuration": 5796,
"throughput": 4361.255,
"sqlDwPolyBase": true,
"errors": [
{
"Code": 11404,
"Message": "ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse.
,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=110825;
Operation failed as split count exceeding upper bound of 1000000.,
Source=.Net SqlClient Data Provider,SqlErrorNumber=110825,Class=16,ErrorCode=-2146232060,State=1,
Errors=[{Class=16,Number=110825,State=1,Message=110825;Operation failed as split count exceeding upper bound of 1000000.,},],'",
"EventType": 0,
"Category": 5,
"Data": {},
"MsgId": null,
"ExceptionType": null,
"Source": null,
"StackTrace": null,
"InnerEventInfos": []

Answer

Hi @kranthi k senapathi ,

Welcome to Microsoft Q&A Platform.

I looked at previous support cases on the similar issues from customers and observed a known limitation that "The PolyBase process has an upper bound limit of 1,000,000 files". As it is visible from the error message that you posted, it is greater than this value as highlighted in below image. Kindly redesign the process to pick files within scope and let us know if the failure persists. I will also discuss with Product team to get it updated in the documentation.

Thanks for your patience!

Share via

Data Factory Copy activity failing with Error code 2200 - Operation failed as split count exceeding upper bound of 1000000

1 answer

Your answer