Copy the data from onprem using Synapse notebook

Question

Copy the data from onprem using Synapse notebook

Mohammed Asif Khan 0

Dear Team,

Would like to if copy the data from onprem using Synapse notebook.

What will be setup requirement.

Pls let us if setup can be done

I have went through one of the MS article which talks about on_prem to Databricks, considering, if same can done to use for Synapse.

https://learn.microsoft.com/en-us/azure/databricks/security/network/classic/on-prem-network

Martin B 126 Reputation points

2024-08-20T15:24:44.8633333+00:00

As of my understanding this is not correct. A self-hosted-integration-runtime can only be leveraged by specific Synapse Pipeline activities (CopyActivity, SQL Script activity) - but not by a notebook.

2 answers

Your answer

Martin B 126 Reputation points

2024-08-20T15:24:44.8633333+00:00

As of my understanding this is not correct. A self-hosted-integration-runtime can only be leveraged by specific Synapse Pipeline activities (CopyActivity, SQL Script activity) - but not by a notebook.

Answer 1

Hi Mohammed Asif Khan,

Thanks for reaching out to Microsoft Q&A.

To copy data from an on-premises source to Azure Synapse using notebooks, you'll need to set up a self-hosted integration runtime. Here are the key steps:

Create a self hosted integration runtime

In Synapse, go to the "Manage" hub and select "Integration Runtimes" > Click "New" and choose "Self-hosted" as the integration runtime type
Provide a name and select the appropriate compute location

Install the integration runtime on an onprem machine

Download and run the self-hosted integration runtime setup on a machine that can access your on-premises data source
Register the runtime with the key provided in the Azure Synapse portal

Create a linked service to your on-premises data source

Create a new linked service and select your data source type
Choose the self-hosted integration runtime created earlier and provide the connection details for your on-premises data source

Use the linked service in your Synapse notebook

In a Synapse notebook, create a DataFrame by reading from the on-premises data source using the linked service
You can then process and transform the data as needed within the notebook

Additional requirements:

Network Connectivity(if not already setup)

VPN/ExpressRoute: You need a secure network connection between your on-premises environment and Azure. This is typically done using a VPN Gateway or ExpressRoute. This allows your on-premises network to communicate with Azure Synapse securely.
Private Endpoints: Ensure that your Synapse workspace is configured with private endpoints, allowing your on-premises environment to connect to Synapse securely over the private link.

Monitoring and Management

Integration Runtime Monitoring: Monitor the performance and availability of your SHIR to ensure that data transfers are occurring as expected.
Notebook Execution Monitoring: Track the progress of your notebook execution and handle any errors that may arise during the data transfer process.

Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

Answer 2

Sina Salam 22,031 Volunteer Moderator

Hello Mohammed Asif Khan,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you would like to copy data from on-premises using Synapse notebook and also you asked about an article you read which talks about on-premises to Databricks, and you're considering if same can done to use for Synapse.

Most of all, about your comment from @Martin B and for more clarification on the solution @Vinodh247 provided.

Yes @Martin B is right that a self-hosted integration runtime (IR) can only be utilized by specific Synapse Pipeline activities, such as CopyActivity and SQL Script activity. Unfortunately, it cannot be directly leveraged within a Synapse notebook. Synapse notebooks can execute SQL queries, Python, and other code, they do not directly interact with the self-hosted IR.

What you will need to do:

For copying data from on-premises sources, you should use Synapse Pipelines with the self-hosted integration runtime. Create a pipeline with a Copy Activity that leverages the self-hosted IR to transfer data from your on-premises data source to Azure Synapse Analytics.
Once the data is ingested into Azure Synapse Analytics, you can use Synapse Notebooks to process and analyze the data. Use Spark or SQL-based operations in the notebooks for data transformation and analysis.

Other options:

Using Azure Data Factory can help you to create an Azure Data Factory pipeline that utilizes the self-hosted IR. Then trigger this pipeline from your Synapse notebook.
Using Direct SQL Queries can help you if your on-premises data resides in a supported database (e.g., SQL Server), you can connect directly from your notebook using JDBC or ODBC connections.

Regarding your second question:

The article you mentioned focuses on on-premises to Databricks connectivity, the principles are similar for Synapse. Both services use integration runtimes and pipelines to move data. However, you'll need to adapt the setup specifically for Synapse notebook.

Finally, your setup requirements depend on your available resources and any of the options above.

Accept Answer

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

Best Regards,

Sina Salam

Mohammed Asif Khan 0 Reputation points

2024-08-21T11:50:13.1133333+00:00
Dear Sina Salam,

Thanks for the details,

Objective: run the direct SQL query from Synapse notebook using Spark Pool

Constraints:

The source system performs UPSERT and DELETE operations, causing changes in the source table rows.

A Self-hosted Integration Runtime (SHIR) is already set up in Azure Data Factory (ADF).

It is understood that the same SHIR cannot be shared between ADF and Azure Synapse.

Setting up a separate SHIR for Synapse is cost-prohibitive for the simple requirement of selecting the row count from an on-premises table.

You have mentioned that :

Using Direct SQL Queries can help you if your on-premises data resides in a supported database (e.g., SQL Server), you can connect directly from your notebook using JDBC or ODBC connections

Let me how this can be setup all end to end steps

We have Synapse and on-prem SQL ( Azure VM and hosted SQL Server)inside VM to simulate on_prem )

Pls let me know if product team can do something about it.

Thanks

Asif Khan
Sina Salam 22,031 Reputation points Volunteer Moderator

2024-08-22T11:48:32.3466667+00:00
Hi Asif Khan,

Thank you for your feedback.

The solution outlined below is based on established best practices and official documentation. The flow chart shows what your team will need to do:

Prerequisites: https://docs.microsoft.com/azure/synapse-analytics/overview and https://docs.microsoft.com/azure/virtual-machines/quick-create-portal

Set Up Networking: https://docs.microsoft.com/azure/virtual-network/virtual-networks-overview, https://docs.microsoft.com/azure/expressroute/expressroute-introduction and https://docs.microsoft.com/azure/virtual-network/security-overview#network-security-groups

Configure SQL Server and VM: https://docs.microsoft.com/windows/security/threat-protection/windows-firewall/create-an-inbound-port-rule and https://docs.microsoft.com/sql/sql-server/administration/configure-remote-access)

Set Up Synapse Integration: https://docs.microsoft.com/azure/synapse-analytics/data-integration/create-linked-service and https://docs.microsoft.com/azure/synapse-analytics/data-integration/troubleshoot-linked-service

Query Your Data: https://docs.microsoft.com/azure/synapse-analytics/notebooks and https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-overview-what-is

Monitor and Troubleshoot: https://docs.microsoft.com/azure/synapse-analytics/monitoring and https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/troubleshoot-performance

Optional Data Integration: https://docs.microsoft.com/azure/data-factory/introduction
Mohammed Asif Khan 0 Reputation points

2024-08-23T00:03:34.7766667+00:00

Dear @ Sina Salam,

Thank you for providing the details.

Flow chart is informative .

I have tried to open the URL mentioned in the above section (Aug 22, 2024, 7:48 PM)

most of them showing 404 - Page not found .

May I request you to provide the accessible urls please . Thanks
NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff

2024-08-26T05:18:23.27+00:00

Hi @Mohammed Asif Khan
You can refer to the below links:

https://learn.microsoft.com/en-us/azure/synapse-analytics/overview-what-is
https://learn.microsoft.com/en-us/azure/virtual-machines/windows/quick-create-portal
https://learn.microsoft.com/en-us/azure/virtual-network/virtual-networks-overview
https://learn.microsoft.com/en-us/azure/expressroute/expressroute-introduction
https://learn.microsoft.com/en-us/azure/virtual-network/network-security-groups-overview#network-security-groups
https://learn.microsoft.com/en-us/windows/security/operating-system-security/network-security/windows-firewall/configure
https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-remote-access-server-configuration-option?view=sql-server-ver16
https://learn.microsoft.com/en-us/azure/data-factory/concepts-linked-services?tabs=data-factory
https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-notebook-concept
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-overview-what-is
https://learn.microsoft.com/en-us/azure/synapse-analytics/monitor-synapse-analytics
https://learn.microsoft.com/en-us/training/modules/manage-monitor-data-warehouse-activities-azure-synapse-analytics/

Hope this helps. Please let us know if you have any further questions.
NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff

2024-08-27T09:23:52.0766667+00:00

Hi @Mohammed Asif Khan
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff

2024-08-28T15:39:56.3633333+00:00

Hi @Mohammed Asif Khan
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Share via

Copy the data from onprem using Synapse notebook

2 answers

Your answer