Data Factory Data flow Error Cluster creation failed

Question

Data Factory Data flow Error Cluster creation failed

Colinn Calaguas 45

I have a data factory pipeline that's been running for a year without any issues then suddenly we received an error

Internal Server Error In Synapse batch operation: 'Cluster creation failed due to unexpected error. Please retry'

This error came from a Data Flow Activity that removes the last commas of every row in the source csv file. That file contains 7000+ rows. Here is how the activity itself and its properties look like RemoveComma

In the History of the error itself I've checked the input and output of the activity run for this part. INPUT:

{
    "dataflow": {
        "referenceName": "Remove Comma",
        "type": "DataFlowReference",
        "parameters": {},
        "datasetParameters": {
            "GetSourceFile": {},
            "sink1": {}
        }
    },
    "staging": {},
    "compute": {
        "coreCount": 8,
        "computeType": "General"
    },
    "traceLevel": "Fine",
    "continuationSettings": {
        "customizedCheckpointKey": "Accounting_CleanUp-Remove Last Two Commas-daf923f2-bf64-4531-aa3b-daa49fb8399f"
    }
}

OUTPUT:

{
              "status": {
                             "Name": "Dataflow",
                             "WorkspaceName": "Sample",
                             "ComputeName": "General-8-1",
                             "Id": 281,
                             "FabricJobId": null,
                             "AppId": null,
                             "Result": "Failed",
                             "AppInfo": null,
                             "SchedulerInfo": {
                                           "CurrentState": "ended",
                                           "SubmittedAt": "02/28/2024 13:09:26",
                                           "QueuedAt": "02/28/2024 13:09:26",
                                           "ScheduledAt": "02/28/2024 13:09:31",
                                           "EndedAt": "02/28/2024 13:35:27",
                                           "CancellationRequestedAt": null
                             },
                             "ErrorInfo": [
                                           {
                                                          "Message": "[plugins.Sample.General-8-1-a65edd63-a1ae-4d97-9e7c-e6373b8e4e4e.5879e7dc-a96c-40b2-b118-dc1e96bdd010 WorkspaceType:<ADF> CCID:<>] MaxClusterCreationAttempts=[3] Attempt=[0] ComputeNodeSize=[Small] ClusterId=[c04f0d81-84af-4d54-a7f0-d91b3eb9a586] AdlaResourceId=[] [Creation] -> [Cleanup]. The cluster creation has failed more than the [3] of times. IsTimeout=[False] IsTerminal=[True] IsRetryable=[True] RetryOnClusterCreation=[False] ErrorType=[None] ErrorMessage=[]",
                                                          "ErrorCode": "FAILED_CLUSTER_CREATION"
                                           }
                             ],
                             "LivyInfo": {
                                           "JobCreationRequest": null
                             },
                             "State": "error",
                             "Log": null,
                             "PluginInfo": {
                                           "PreparationStartedAt": "02/28/2024 13:09:31",
                                           "ResourceAcquisitionStartedAt": "02/28/2024 13:09:31",
                                           "SubmissionStartedAt": null,
                                           "MonitoringStartedAt": null,
                                           "CleanupStartedAt": "02/28/2024 13:35:27",
                                           "CurrentState": "Ended"
                             }
              },
              "effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime",
              "executionDuration": 1705,
              "durationInQueue": {
                             "integrationRuntimeQueue": 0
              }
}

I'm not sure what suddenly went wrong since we didn't change anything in this pipeline. Should we increase compute size any other than small since data being ingested are becoming larger?

Lang, Robert 41 Reputation points

2024-02-28T14:53:17.8933333+00:00

duplicate comment
Lang, Robert 41 Reputation points

2024-02-28T15:10:45.48+00:00

Same issue here. All our pipelines that push data to a Snowflake instance have started to fail this morning starting about 6:30 AM. They ran successfully at 5:30 AM. We are also not able to start any debug clusters on this data factory. I recreated one of the pipelines on another data factory and it was able to successfully push data.
The data factory that we are having issues with is on East US 2. The data factory that was successful in pushing data in on East US. Microsoft's Azure Service Status is not showing any issues with data factory in both of these regions.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-02-28T20:44:18.1233333+00:00

Hello Colinn Calaguas, Lang, Robert

PG has acknowledged the issue, and they are actively working on it. I will provide an update as soon as I hear something from them.

Thank you for your patience.
Colinn Calaguas 45 Reputation points

2024-02-28T23:37:40.32+00:00

Yes, it is also in the same region as us (East US 2). Thank you for acknowledging this issue immediately. Appreciate it.

Accepted answer

0 additional answers

Your answer

Lang, Robert 41 Reputation points

2024-02-28T14:53:17.8933333+00:00

duplicate comment
Lang, Robert 41 Reputation points

2024-02-28T15:10:45.48+00:00

Same issue here. All our pipelines that push data to a Snowflake instance have started to fail this morning starting about 6:30 AM. They ran successfully at 5:30 AM. We are also not able to start any debug clusters on this data factory. I recreated one of the pipelines on another data factory and it was able to successfully push data.
The data factory that we are having issues with is on East US 2. The data factory that was successful in pushing data in on East US. Microsoft's Azure Service Status is not showing any issues with data factory in both of these regions.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-02-28T20:44:18.1233333+00:00

Hello Colinn Calaguas, Lang, Robert

PG has acknowledged the issue, and they are actively working on it. I will provide an update as soon as I hear something from them.

Thank you for your patience.
Colinn Calaguas 45 Reputation points

2024-02-28T23:37:40.32+00:00

Yes, it is also in the same region as us (East US 2). Thank you for acknowledging this issue immediately. Appreciate it.

Answer 1

Bhargava-MSFT 31,261 Microsoft Employee Moderator

Hello Colinn Calaguas,

The issue was mitigated last night. You shouldn't see this error anymore. In case you are still experiencing the issue, please let me know.

If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.

Lang, Robert 41 Reputation points

2024-02-29T19:03:21.1333333+00:00

All of our issues resolved last night starting sometime after 8:30 PM EST.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-02-29T19:54:00.2833333+00:00

Thank you for the confirmation , Lang, Robert
Ashir Khan 0 Reputation points

2024-09-26T08:33:05.4733333+00:00

Hi @Bhargava-MSFT

The same issue I'm currently experiencing in Qatar region as well.

Share via

Data Factory Data flow Error Cluster creation failed

0 additional answers

Your answer