Data Factory Copy activity failing with Error code 2200 - Operation failed as split count exceeding upper bound of 1000000

kranthi k senapathi 106 Reputation points
2020-10-25T13:53:15.673+00:00

Hi ,
I am copying large data from ADLS gen 1 to ASDW(Azure synapse Analytics) using Data Factory (polybase set to yes and mentioned blob storage settings).
The Source data is *.parquet format and is partitioned. Copy activity fails. In first attempt I gave the wildcard path of all the parquet files. It threw this error: Operation failed as split count exceeding upper bound of 1000000
In the second attempt, I reduced the copy size to just one partition, I mentioned just one partition folder in ADLS - but it gave another error: " file is not a parquet file (too small)" . Please help me to resolve this. It looks like configuration issue but where should this be done? Thanks

"dataRead": 25884501402,
"filesRead": 1674684,
"sourcePeakConnections": 137,
"copyDuration": 5796,
"throughput": 4361.255,
"sqlDwPolyBase": true,
"errors": [
{
"Code": 11404,
"Message": "ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse.
,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=110825;
Operation failed as split count exceeding upper bound of 1000000.,
Source=.Net SqlClient Data Provider,SqlErrorNumber=110825,Class=16,ErrorCode=-2146232060,State=1,
Errors=[{Class=16,Number=110825,State=1,Message=110825;Operation failed as split count exceeding upper bound of 1000000.,},],'",
"EventType": 0,
"Category": 5,
"Data": {},
"MsgId": null,
"ExceptionType": null,
"Source": null,
"StackTrace": null,
"InnerEventInfos": []

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,492 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,885 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. HarithaMaddi-MSFT 10,136 Reputation points
    2020-10-27T10:43:59.48+00:00

    Hi @kranthi k senapathi ,

    Welcome to Microsoft Q&A Platform.

    I looked at previous support cases on the similar issues from customers and observed a known limitation that "The PolyBase process has an upper bound limit of 1,000,000 files". As it is visible from the error message that you posted, it is greater than this value as highlighted in below image. Kindly redesign the process to pick files within scope and let us know if the failure persists. I will also discuss with Product team to get it updated in the documentation.

    35413-image.png

    Thanks for your patience!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.