Reprocess dataverse data lake previous date folders

Karthik Eetur 21 Reputation points
2023-08-15T02:40:59.0533333+00:00

I've used below "copy dataverse to sql" pipeline(dataflow) suggested my MS.

https://learn.microsoft.com/en-us/power-apps/maker/data-platform/azure-synapse-link-pipelines?tabs=synapse-analytics

I am running synapse pipelines for every 15 mins to copy data from incremental folders to sql database.

I've a scenario where main dataflow pipeline runs for more than 15 mins and orchestrator pipeline skips(status = 3) all subsequent pipelines until the pipeline run is completed. I've to manually clear processinglog table entries with status <> 1 for orchestrator pipeline to pick new folders for processing.

How to reprocess missed previous datetime folders using same MS flow?

I don't think same MS flow works for reprocessing previous datetime folders when new folders are ingested in the database and maxrowversion is a latest one. For example - reprocess 20-30 old folders to catch up data.

Appreciate any help here

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
0 comments No comments
{count} votes

Answer accepted by question author
  1. KranthiPakala-MSFT 46,737 Reputation points Microsoft Employee Moderator
    2023-08-15T22:42:13.19+00:00

    @Karthik Eetur Welcome to Microsoft Q&A forum and thanks for reaching out here.

    When Orchestrator Pipeline encountered unexpected error and failed for the given folder (Incremental folder 1). ThenDataverseToSQLPipelineProcessingLog has entry of 0 (failure) for the failed folder. Then subsequent folders (eg: Incremental folder 2) will be marked as 3 (skipped) and not as successful (1). As the previous folder are not successful (status code <> 1), hence all the subsequent folders will be skipped (Status code = 3). To avoid this skipping folder loop, you may follow below resolution steps suggested by product team:  ​

    1. Identify & resolve the root cause of pipeline failure. ​
    2. Manually run the DataverseToSQL pipeline for the failed folder.​
    3. After successful execution of manual run, update the corresponding row in DataverseToSQLPipelineProcessingLog to 1 (success)​.
    4. Chronologically, sequentially & manually process the subsequent skipped (3) folders and manually update the DataverseToSQLPipelineProcessingLog Status column of corresponding rows to 1 (success).
    5. Once the failed folder & all the skipped folders are marked as successful, DataverseToSQL_Orchestrator pipeline will automatically process the next folder in next trigger.

    You may have question that there are several skipped folders and manually executing them to updating DataverseToSQLPipelineProcessingLog is time consuming and error prone. Is there any better approach?​

    Recommendation for Automated reprocessing:

    Yes, in that case to avoid manual effort, please consider having new pipeline which: ​

    1. Retrieves the skipped folder chronologically​.
    2. Executes the DataverseToSQL pipeline sequentially​.
    3. Updates Status column of DataverseToSQLPipelineProcessingLog rows to 1 (success). Below is sample view of the pipeline looks like for processing skipped folders:

    User's image

    User's image

    User's image

    User's image

    User's image

    User's image

    NOTE: To avoid the execution time overlapping, it is necessary to set the concurrency = 1 for your orchestrator pipeline, that way only one execution will be in progress until completed and the subsequent pipeline runs will be queued and executed in the chronological order. Ensuring that Concurrency setting of Orchestrator pipeline is 1 will avoid this scenario in future.

    I would also recommend going through this thread where I had detailed discussion with another user on this implementation: Copy Dataverse data into Azure SQL using Synapse Link not running the DaterverseToSQL step

    Hope this info helps.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.