Copy the data from onprem using Synapse notebook

Mohammed Asif Khan 0 Reputation points
2024-08-20T07:48:21.1633333+00:00

Dear Team,

Would like to if copy the data from onprem using Synapse notebook.

What will be setup requirement.

Pls let us if setup can be done

I have went through one of the MS article which talks about on_prem to Databricks, considering, if same can done to use for Synapse.

https://learn.microsoft.com/en-us/azure/databricks/security/network/classic/on-prem-network

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,841 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247 16,826 Reputation points
    2024-08-20T13:47:40.16+00:00

    Hi Mohammed Asif Khan,

    Thanks for reaching out to Microsoft Q&A.

    To copy data from an on-premises source to Azure Synapse using notebooks, you'll need to set up a self-hosted integration runtime. Here are the key steps:

    1. Create a self hosted integration runtime
    • In Synapse, go to the "Manage" hub and select "Integration Runtimes" > Click "New" and choose "Self-hosted" as the integration runtime type
    • Provide a name and select the appropriate compute location
    1. Install the integration runtime on an onprem machine
    • Download and run the self-hosted integration runtime setup on a machine that can access your on-premises data source
    • Register the runtime with the key provided in the Azure Synapse portal
    1. Create a linked service to your on-premises data source
    • Create a new linked service and select your data source type
    • Choose the self-hosted integration runtime created earlier and provide the connection details for your on-premises data source
    1. Use the linked service in your Synapse notebook
    • In a Synapse notebook, create a DataFrame by reading from the on-premises data source using the linked service
    • You can then process and transform the data as needed within the notebook

    Additional requirements:

    Network Connectivity(if not already setup)

    • VPN/ExpressRoute: You need a secure network connection between your on-premises environment and Azure. This is typically done using a VPN Gateway or ExpressRoute. This allows your on-premises network to communicate with Azure Synapse securely.
    • Private Endpoints: Ensure that your Synapse workspace is configured with private endpoints, allowing your on-premises environment to connect to Synapse securely over the private link.

    Monitoring and Management

    • Integration Runtime Monitoring: Monitor the performance and availability of your SHIR to ensure that data transfers are occurring as expected.
    • Notebook Execution Monitoring: Track the progress of your notebook execution and handle any errors that may arise during the data transfer process.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    0 comments No comments

  2. Sina Salam 9,486 Reputation points
    2024-08-20T17:10:54.7866667+00:00

    Hello Mohammed Asif Khan,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you would like to copy data from on-premises using Synapse notebook and also you asked about an article you read which talks about on-premises to Databricks, and you're considering if same can done to use for Synapse.

    Most of all, about your comment from @Martin B and for more clarification on the solution @Vinodh247 provided.

    Yes @Martin B is right that a self-hosted integration runtime (IR) can only be utilized by specific Synapse Pipeline activities, such as CopyActivity and SQL Script activity. Unfortunately, it cannot be directly leveraged within a Synapse notebook. Synapse notebooks can execute SQL queries, Python, and other code, they do not directly interact with the self-hosted IR.

    What you will need to do:

    1. For copying data from on-premises sources, you should use Synapse Pipelines with the self-hosted integration runtime. Create a pipeline with a Copy Activity that leverages the self-hosted IR to transfer data from your on-premises data source to Azure Synapse Analytics.
    2. Once the data is ingested into Azure Synapse Analytics, you can use Synapse Notebooks to process and analyze the data. Use Spark or SQL-based operations in the notebooks for data transformation and analysis.

    Other options:

    • Using Azure Data Factory can help you to create an Azure Data Factory pipeline that utilizes the self-hosted IR. Then trigger this pipeline from your Synapse notebook.
    • Using Direct SQL Queries can help you if your on-premises data resides in a supported database (e.g., SQL Server), you can connect directly from your notebook using JDBC or ODBC connections.

    Regarding your second question:

    The article you mentioned focuses on on-premises to Databricks connectivity, the principles are similar for Synapse. Both services use integration runtimes and pipelines to move data. However, you'll need to adapt the setup specifically for Synapse notebook.

    Finally, your setup requirements depend on your available resources and any of the options above.

    Accept Answer

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    ** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

    Best Regards,

    Sina Salam


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.