Copy Dataverse data into Azure SQL using Synapse Link not running the DaterverseToSQL step

Question

Copy Dataverse data into Azure SQL using Synapse Link not running the DaterverseToSQL step

Greenwood, Justin -Administrator 75

I have managed to create a Azure Synapse Link for Dataverse and this is all working fine.

In the Azure Synapse workspace I have connected to my source storage account and to the data base.

I have set up a trigger in fact I have done everything it says to do in https://learn.microsoft.com/en-us/power-apps/maker/data-platform/azure-synapse-link-pipelines?tabs=synapse-analytics.

But I have noticed it is sometime missing the step DaterverseToSQL. Is there any reason for this. I have checked the folders and they have data in them but for some reason it is not pulling the data over.

User's image

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2023-04-04T23:05:37.26+00:00

Welcome to Microsoft Q&A forum and thanks for reaching out here. To better assist on this issue, could you please elaborate a bit about your pipeline flow executions? Seems like you have 3 pipelines, are they dependent on each other? How is the dependency defined? Please elaborate on the business criteria and execution flow. What is expected behavior and what is the observed behavior?
Please share screenshots if you can so that it will be helpful to get better understanding. Thank you

Greenwood, Justin -Administrator 75

@KranthiPakala-MSFT So I have followed the template set out in the https://learn.microsoft.com/en-us/power-apps/maker/data-platform/azure-synapse-link-pipelines?tabs=synapse-analytics. The trigger is set as below (Fig1) on DataverseToSQL_Orchestrator1 Pipeline (Fig2) |Pipeline name|Status|Activity type|Run start|Duration|Error|Log|Integration runtime|User properties| Run ID| | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | |UpdateProcessingLogAsSkipped|Succeeded|Script|04/04/2023, 11:55:24|00:00:05|||AutoResolveIntegrationRuntime (UK South)|{}|| |If Execution is Skipped|Succeeded|IfCondition|04/04/2023, 11:55:23|00:00:09||||{}|| |Set execution skipped variable|Succeeded|SetVariable|04/04/2023, 11:55:22|00:00:01||||{}|| |If Previous Runs Unsuccessful|Succeeded|IfCondition|04/04/2023, 11:55:22|00:00:02||||{}|| |CheckPreviousRuns|Succeeded|Lookup|04/04/2023, 11:55:18|00:00:03|||AutoResolveIntegrationRuntime (UK South)|{}|| |SetupProcessingLog|Succeeded|Script|04/04/2023, 11:55:14|00:00:05|||AutoResolveIntegrationRuntime (UK South)|{}|| |Execute CopyOptionsetMetadata|Succeeded|ExecutePipeline|04/04/2023, 11:55:14|00:00:01||||{}|95c76e94-496e-4434-b6c6-c7ac78b6d09e|

SetupProcessingLog Input

{
    "scripts": [
        {
            "type": "NonQuery",
            "text": "DECLARE @schemaName NVARCHAR(20) = 'dbo';\nDECLARE @tablePrefix NVARCHAR(148) = '';\nDECLARE @container NVARCHAR(70) = 'dataverse-wakefieldcrm-orgd4c3b67c';\nDECLARE @folder NVARCHAR(70) = '2023-04-04T10.40.08Z';\nDECLARE @runid NVARCHAR(40) = 'b9963e04-1282-4cff-9832-1be3044adde5';\n\nIF NOT EXISTS (SELECT name FROM sys.schemas WHERE name =@schemaName)\nEXEC('CREATE SCHEMA ['+@schemaName+']')\n\nDECLARE @CreateProcessingLogDDL NVARCHAR(MAX) =\n'IF OBJECT_ID(''' + @schemaName + '.' + @tablePrefix + 'DataverseToSQLPipelineProcessingLog'',''U'') IS NULL \nCREATE TABLE ' + @schemaName + '.' + '[' + @tablePrefix + 'DataverseToSQLPipelineProcessingLog](\n\t[Container] [nvarchar](100) NOT NULL,\n\t[Folder] [nvarchar](100) NOT NULL,\n\t[PipelineRunId] [nvarchar](50) NOT NULL,\n\t[ProcessingStarted] [datetime2](7) NOT NULL DEFAULT(GETUTCDATE()),\n\t[ProcessingEnded] [datetime2](7) NULL,\n\t[Status] [int] NULL\n PRIMARY KEY CLUSTERED \n(\n\t[Container] ASC,\n\t[Folder] ASC\n)\n) ON [PRIMARY]';\nEXECUTE sp_executesql  @CreateProcessingLogDDL;\n\n\nDECLARE @UpsertProcessingLog NVARCHAR(MAX) =\n'\nIF EXISTS (SELECT Folder FROM ' + @schemaName + '.' + @tablePrefix + 'DataverseToSQLPipelineProcessingLog WHERE Container = '+ '''' + @container + '''' +' AND Folder = ' + ''''+ @folder + ''')\n\tUPDATE '+ @schemaName + '.' + @tablePrefix + 'DataverseToSQLPipelineProcessingLog\n\tSET\n\t\tPipelineRunId = ' + '''' + @runid +'''' +',' + \n\t\t'Status = 2, ProcessingStarted = GETUTCDATE() \n\tWHERE Container = '+ '''' + @container + '''' +' AND Folder = ' + ''''+ @folder + '''\nELSE\n\tINSERT INTO '+ @schemaName + '.' + @tablePrefix + 'DataverseToSQLPipelineProcessingLog (Container, Folder, PipelineRunId, Status)\n\tVALUES ('''+@container+''''+','+ ''''+ @folder+''''+','+''''+@runid+''''+','+'2)' \n;\nEXECUTE sp_executesql @UpsertProcessingLog;"
        }
    ]
}

SetupProcessingLog Output

{
	"resultSetCount": 0,
	"recordsAffected": 1,
	"resultSets": [],
	"outputParameters": {},
	"outputLogs": "",
	"outputLogsLocation": "",
	"outputTruncated": false,
	"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (UK South)",
	"executionDuration": 0,
	"durationInQueue": {
		"integrationRuntimeQueue": 0
	},
	"billingReference": {
		"activityType": "PipelineActivity",
		"billableDuration": [
			{
				"meterType": "AzureIR",
				"duration": 0.016666666666666666,
				"unit": "Hours"
			}
		]
	}
}

CheckPreviousRuns Input

{
    "source": {
        "type": "AzureSqlSource",
        "sqlReaderQuery": "SELECT COUNT(1) AS cnt\n  FROM dbo.[DataverseToSQLPipelineProcessingLog]\n  WHERE Status <> 1\n  AND (Container = 'dataverse-############-orgd#######' AND Folder <> '2023-04-04T10.40.08Z' )",
        "queryTimeout": "02:00:00",
        "isolationLevel": "ReadUncommitted",
        "partitionOption": "None"
    },
    "dataset": {
        "referenceName": "DataverseToSQLDestination1",
        "type": "DatasetReference",
        "parameters": {
            "tablename": "[DataverseToSQLPipelineProcessingLog]"
        }
    }
}

CheckPreviousRuns output

{
	"firstRow": {
		"cnt": 88
	},
	"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (UK South)",
	"billingReference": {
		"activityType": "PipelineActivity",
		"billableDuration": [
			{
				"meterType": "AzureIR",
				"duration": 0.016666666666666666,
				"unit": "DIUHours"
			}
		]
	},
	"durationInQueue": {
		"integrationRuntimeQueue": 0
	}
}

If Previous Runs Unsuccessful input {} If Previous Runs Unsuccessful output {} Set execution skipped variable InPut

{
    "variableName": "executionskipped",
    "value": true
}

Set execution skipped variable OutPut

{
	"name": "executionskipped",
	"value": true
}

And it looks like this is the step that is falling over as that value should be true.

Fig1

Fig2

Accepted answer

2 additional answers

Your answer

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2023-04-04T23:05:37.26+00:00

Welcome to Microsoft Q&A forum and thanks for reaching out here. To better assist on this issue, could you please elaborate a bit about your pipeline flow executions? Seems like you have 3 pipelines, are they dependent on each other? How is the dependency defined? Please elaborate on the business criteria and execution flow. What is expected behavior and what is the observed behavior?
Please share screenshots if you can so that it will be helpful to get better understanding. Thank you

Answer 1

Ray Chung , Greenwood, Justin -Administrator :

A record was created and included in incremental folder 1. The same record was later edited and included in incremental folder 2. Processing of incremental folder 1 failed, while processing of incremental folder 2 was successful. After a rerun, processing of incremental folder 2 was still successful. Which values of the record are current? Those from incremental folder 1 or 2

When Orchestrator Pipeline encountered unexpected error and failed for the given folder (Incremental folder 1). ThenDataverseToSQLPipelineProcessingLog has entry of 0 (failure) for the failed folder. Then subsequent folders (eg: Incremental folder 2) will be marked as 3 (skipped) and not as successful (1). As the previous folder are not successful (status code <> 1), hence all the subsequent folders will be skipped (Status code = 3). To avoid this skipping folder loop, please follow below resolution steps:

Identify & resolve the root cause of pipeline failure.
Manually run the DataverseToSQL pipeline for the failed folder.
After successful execution of manual run, update the corresponding row in DataverseToSQLPipelineProcessingLog to 1 (success).
Chronologically, sequentially & manually process the subsequent skipped (3) folders and manually update the DataverseToSQLPipelineProcessingLog Status column of corresponding rows to 1 (success).
Once the failed folder & all the skipped folders are marked as successful, DataverseToSQL_Orchestrator pipeline will automatically process the next folder in next trigger. You may have question that there are several skipped folders and manually executing them to updating DataverseToSQLPipelineProcessingLog is time consuming and error prone. Is there any better approach?

Recommendation:

Yes, in that case to avoid manual effort, please consider having new pipeline which:

Retrieves the skipped folder chronologically.
Executes the DataverseToSQL pipeline sequentially.
Updates Status column of DataverseToSQLPipelineProcessingLog rows to 1 (success). Below is sample view of the pipeline looks like for processing skipped folders:

Greenwood, Justin -Administrator - From your earlier response I see that 594 folders were status <>1 . Since their status is <> 1, all subsequent folders will be skipped and will not execute the main pipeline Execute DataverseToSQLPipeline . In order to overcome the issue, as I described above, please try to identify the reason why the initial folder are being skipped and reprocess them in chronological order and update the log status to 1 for those folders and once all the backlog is cleared, the subsequent folders will be copied automatically by the main Orchestrator pipeline.

"firstRow": {

		"cnt": 594

	},

Important Note: If the pipeline getting skipped without any failure, then it means that prior pipeline runs for previous folder are failed (status = 0) or skipped (status = 3) due to pipeline execution time overlapped as shown in below image.

To avoid the execution time overlapping, as I mentioned in my previous posts, it is necessary to set the concurrency = 1 for your orchestrator pipeline, that way only one execution will be in progress until completed and the subsequent pipeline runs will be queued and executed in the chronological order. Ensuring that Concurrency setting of Orchestrator pipeline is 1 will avoid this scenario in future. I will also provide feedback to the document/template owner to update the template in such a way that by default concurrency is set to 1 from the template gallery.
User's image

I hope this explains the scenarios why the folders will be skipped and how that impacts the subsequent folder runs and how to reprocess those skipped folder runs.

In case if this is still not clear, and you are blocked I would recommend you to please log a support ticket so that a support engineer can schedule a call and go through your pipeline history and will suggest the next steps to fix the problem. Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

Ray Chung 0 Reputation points

2023-04-13T22:26:40.2633333+00:00

KranthiPakala-MSFT: Thank you so much for the explanation! The sample pipeline to process skipped folders is exactly what I'm looking for. Can we obtain this sample pipeline as a template?
KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2023-04-18T18:16:37.78+00:00

Hi Ray Chung,

Thanks for your response. No the reprocessing of skipped folders pipeline is not available as template at this point. But you can follow the steps as I mentioned in my previous to create a separate pipeline for reprocessing them.

Hope this info helps.

@Greenwood, Justin -Administrator - Following up to see if the above response was helpful and answer your query. If so, please feel free to Accept Answer and click Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

Thank you
Greenwood, Justin -Administrator 75 Reputation points

2023-04-20T08:22:09.3333333+00:00

Thank you so much for this it is now finally working
Ray Chung 0 Reputation points

2023-04-20T08:38:07.8666667+00:00

@Greenwood, Justin -Administrator: Good to hear that your solution is working. How about the performance of your pipelines? In my situation every 15 min. an incremental folder will be delivered. But the average duration of processing is 20-25 minutes. After few day up and running we have more than 80 incremental folders in the queue. Now the queue quota is reached and all triggers of the new incremental folders are getting error.
Greenwood, Justin -Administrator 75 Reputation points

2023-04-20T14:58:45.1333333+00:00

Ya every 15 mins is my increment. but the file load is around 5 mins
Maybe change the Azure Synapes link increment time to 30, to see if this allows the data upload time to transfer
Greenwood, Justin -Administrator 75 Reputation points

2023-06-14T08:58:39.1133333+00:00

Hi @KranthiPakala-MSFT

Do you have the full details on the Skipped Process
Greenwood, Justin -Administrator 75 Reputation points

2023-06-14T09:13:29.9266667+00:00

This is the part in Question:

Answer 2

KranthiPakala-MSFT 46,642 Microsoft Employee Moderator

@Greenwood, Justin -Administrator Thanks for your response and additional details.

When I looked at the query of CheckPreviousRuns It is looking for count FROM dbo.[DataverseToSQLPipelineProcessingLog] where Status <> 1 AND (Container = 'dataverse-############-orgd#######' AND Folder <> '2023-04-04T10.40.08Z' )

And as per the condition defined in the If Previous Runs Unsuccessful activity, if the CheckPreviousRuns returns count > 0 then if will not trigger Execute DataverseToSQLPipeline activity.

Condition:

@greater(activity('CheckPreviousRuns').output.firstRow.cnt,0)

Execute DataverseToSQLPipeline activity will be executed only when cnt = 0 from CheckPreviousRuns activity.

In your case, as per the CheckPreviousRuns output, the value of cnt = 88 (which means you have 88 unsuccessful runs) which is why your If Previous Runs Unsuccessful condition is always true and skips the Execute DataverseToSQLPipeline activity executions as it enters the true path of the IF condition evaluation.

{
	"firstRow": {
		"cnt": 88
	},
	"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (UK South)",
	"billingReference": {
		"activityType": "PipelineActivity",
		"billableDuration": [
			{
				"meterType": "AzureIR",
				"duration": 0.016666666666666666,
				"unit": "DIUHours"
			}
		]
	},
	"durationInQueue": {
		"integrationRuntimeQueue": 0
	}
}

As called out by Ray Chung, when previous pipeline runStatus = 1 then the Execute DataverseToSQLPipeline will be triggered.

Could you please set your pipeline concurrency to 1 and see if that helps to resolve the issue. This setting will ensure that the previous pipeline run is completed before the next pipeline concurrent run is starts the execution, until then the latest pipeline runs will be Queued. Please note that there is a limit here, max pipeline queued size is 100 and if goes beyond that then the subsequent runs will be failed. User's image

User's image

Hope this clarifies why Execute DataverseToSQLPipeline is skipped in your case. Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

Ray Chung 0 Reputation points

2023-04-11T22:23:20.2766667+00:00

How can this problem be solved?
Ray Chung 0 Reputation points

2023-04-11T22:24:18.79+00:00

How can this problem be solved?
KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2023-04-11T22:44:51.0266667+00:00

Hi Ray Chung,

Please make sure that Wait on completion setting under Execute DataverseToSQLPipeline activity is checked so that it will wait for the child pipeline (DataverseToSQL) to succeed before it updates the DataverseToSQL_Orchestrator pipeline status as finished/succeeded.

Setting the pipeline concurrency in the parent pipeline will ensure that the previous run of this pipeline (parent) is completed before the subsequent concurrent pipeline run beings execution.

Hope this helps. Let us know how it goes.

Thanks
Ray Chung 0 Reputation points

2023-04-12T08:03:37.8133333+00:00

I have applied the mentioned settings. Now the new triggered orchestrator pipeline has been queued. But the situation now is that one orchestrator pipeline is progressing and two in the queue. Is it possible to make the pipeline run faster? If yes, how?

Greenwood, Justin -Administrator 75

Hi Kranthi I tried as above but it still failed the check prev runs is now as below:

{
	"firstRow": {
		"cnt": 594
	},
	"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (UK South)",
	"billingReference": {
		"activityType": "PipelineActivity",
		"billableDuration": [
			{
				"meterType": "AzureIR",
				"duration": 0.016666666666666666,
				"unit": "DIUHours"
			}
		]
	},
	"durationInQueue": {
		"integrationRuntimeQueue": 1
	}
}

I did find if I changed the true/false action over then it did run as it should

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2023-04-12T20:29:10.49+00:00

@Greenwood, Justin -Administrator - Thanks for the update. unfortunately, I don't have a Dataverse instance to test this, but I have reached out to respective document owners to see if we are missing anything in the template that is suggested in the document. Will get back to you as soon as I have an update from the team.

Thank you for your patience.

Ray Chung - When we set the concurrency to 1 only one instance of the pipeline will run at a time and subsequent runs will be queued until the first run is completed. The longer execution time of the pipeline depends on execution time of the inner pipeline DataverseToSQL .

As I mentioned above, I'm trying to reach out to respective owners of this template and the doc owners to get better understanding and see if something is missing from template implementation standpoint and will keep you posted as soon as I hear back from the team. Thank you
Ray Chung 0 Reputation points

2023-04-12T21:16:47.6966667+00:00

@KranthiPakala-MSFT :
I would like to know how the template deals with the situation below: A record was created and included in incremental folder 1. The same record was later edited and included in incremental folder 2. Processing of incremental folder 1 failed, while processing of incremental folder 2 was successful. After a rerun, processing of incremental folder 2 was still successful. Which values of the record are current? Those from incremental folder 1 or 2?
Greenwood, Justin -Administrator 75 Reputation points

2023-04-13T10:57:28.0166667+00:00

@KranthiPakala-MSFT : Thank you

Answer 3

I have te same issue. I figured out that the DataverseToSQL pipeline will only be triggered when the check in the DataverseToSQL_Orchestrator is passed (Status code of the previous runs must be equal to 1). In our situation the previous run is still processing while the new run has been triggered. The status of the new run and the runs after will be set to the code 3 which means the mentioned check always fails and the DataverseToSQL will not be triggered again. Do you already have a solution for this issue?

Share via

Copy Dataverse data into Azure SQL using Synapse Link not running the DaterverseToSQL step

2 additional answers

Your answer