Share via

Onprem server/Database to Cloud - Data Copy Automation via Azure Data Factory

Nalini Bhavaraju 170 Reputation points
2026-02-09T15:42:54.12+00:00

Hi Team,

If I am copying data from a On prem server/Database to Azure Data Lake storage via Azure Data factory, Can I automate the ADF Pipeline for this copy to happen once every day?

How does the self hosted integration Environment managed in this case to avoid connection error when automated?

And, what are the pros and cons of having the Self hosted Environment service not stopped for this pipeline execution? In terms of costs, CPU memory/Usage etc.,

Is there any way we can start and stop the service automatically after usage during the scheduled automation?

Also, If I automate the process and if I am not onsite will the pipeline fail to connect to the database/server as the server is On-prem?

Thanks,

Nalini

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Nalini Bhavaraju 170 Reputation points
    2026-02-10T19:18:46.94+00:00

    Hi Manoj,

    Thanks for the detailed writeup.

    Here are the answers for the follow up questions -

    1. Do you have any specific data size or type you are working with that could impact the execution time of the pipeline? Data size is not very huge its around 3-5 GB
    2. Have you already set up your self-hosted integration runtime? Yes, But When I trigger my ADF pipeline for copying data from SQL to ADLS - will start the runtime and stop it once after copy.
    3. Are there specific times when you anticipate the data will be accessed more frequently or have stricter availability needs? Planning to setup a daily once Trigger to copy data No specific timeline for trigger decided yet.
    4. Would you like guidance on setting up the Azure Function or Automation scripts? yes Is there a documentation that helps me to understand more on how to leverage Azure Functions/Azure Automation for self hosted integration Environment using Azure Data factory.

    Thanks,

    Nalini.

    0 comments No comments

  2. Manoj Kumar Boyini 8,570 Reputation points Microsoft External Staff Moderator
    2026-02-10T01:37:14.13+00:00

    Hi Nalini Bhavaraju

    It looks like you’re trying to automate data copies from an on-prem server to Azure Data Lake Storage using Azure Data Factory (ADF). Here’s how you can manage that:

    Automating the ADF Pipeline

    Yes, you can definitely automate your ADF pipeline to run daily. You can set a schedule by using triggers within Azure Data Factory. This way, your data can be copied at a specific time each day without needing manual intervention.

    Self-hosted Integration Runtime Management

    When using a self-hosted integration runtime to connect to your on-premises database, it's essential to keep it running to avoid connection errors during automation. If it's stopped, any scheduled pipeline execution might fail to connect to the database.

    Pros and Cons of Keeping the Self-hosted Environment Running

    Pros:

    • Continuous Availability: Always ready to process data, which helps in meeting your scheduled automation requirements.
    • Reduced Connection Errors: Less likelihood of running into connectivity issues during scheduled jobs.

    Cons:

    • Cost: You incur costs based on the compute resources allocated to the integration runtime.
    • CPU and Memory Usage: Continuous operation can lead to higher CPU and memory consumption, which may not be efficient if the integration runtime is used infrequently.

    Automatic Start and Stop

    Currently, Azure Data Factory doesn't support automatic start/stop of the self-hosted integration runtime directly. However, you can leverage Azure Functions or Azure Automation to script the stopping and starting of the integration runtime, allowing it to only run during specific hours when data copy operations are scheduled.

    Failure to Connect When Offsite

    If the ADF pipeline is automated to run while you are not onsite, it shouldn’t fail to connect to the on-prem DB as long as the integration runtime is properly configured and running. It’s crucial to ensure that the integration runtime has the necessary access permissions and that there are no firewall rules blocking the connection.

    Follow-up Questions:

    1. Do you have any specific data size or type you are working with that could impact the execution time of the pipeline?
    2. Have you already set up your self-hosted integration runtime?
    3. Are there specific times when you anticipate the data will be accessed more frequently or have stricter availability needs?
    4. Would you like guidance on setting up the Azure Function or Automation scripts?

    Hope this helps get your data workflow up and running! If you have further questions, feel free to ask!

    References:


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.