Need help to decide Salesforce to SQL Server ADF pipeline frequency

Vipin Sharma 116 Reputation points
2022-10-06T07:07:24.173+00:00

Hi,

I am using ADF pipelines to move data from salesforce to SQL Server, since there is no possibility to keep it real time we want to decide sync frequency and would like to keep it as small as possible, probably 5 minutes.

My questions is: If we decide to keep sync frequency 5 minutes due to low number of updates in salesforce. There can be a situation we get large number of records update in salesforce due to XYZ reason and ADF pipelines takes 30 minutes, in this case which of below is true and do we have any configuration to choose for some of these options?

  1. All sync processes start as per the timeline and couple of sync processes might be running in parallel, each may have different set of data to copy so no conflicts.
  2. The next couple of sync processes goes into queue and will be executed one one one when 30 min sync process is finished.
  3. ADF pipeline will just discard those that were planned to be scheduled during these time 30 min sync while process is running and will start the next pipeline that was scheduled once 30 min sync is completed.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 32,161 Reputation points Microsoft Employee
    2022-10-07T10:06:49.93+00:00

    Hi @Vipin Sharma ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    As I understand your question, you want to know what should be the approach to trigger a pipeline continuously in specified time along with having dependency on the previous run. Please let me know if my understanding has some gap.

    Out of the three scenarios you have thought of, the first one is somewhat correct. If you have schedule trigger associated with the pipeline, the consecutive runs will get triggered , regardless of the fact that the previous run is still in progress. In this case, conflict may or may not arise, depending on lots of factors like: Self hosted IR node availability, source server resource deadlock, target table insert or update conflict etc.

    The best approach here would be to go with Tumbling window trigger where you have the capability to add self-dependency on the trigger.

    The trigger would not proceed to the next window until the preceding window is successfully completed, build a self-dependency.

    For more details, kindly check : Tumbling window self-dependency properties
    Tumbling Window Trigger Dependency in Azure Data Factory

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful