Insert/Update data in Azure Synapse Dedicated SQL Pool and keep data sync in ADLS Gen2.

Question

Insert/Update data in Azure Synapse Dedicated SQL Pool and keep data sync in ADLS Gen2.

Mayank Patel 20

We have Azure Synapse Dedicated SQL Pool and load Parquet files from ADLSGen2 to Azure Synapse Dedicated SQL Pool.

Now, we have a use case where an external application would like to update/insert/delete data in Azure Synapse Dedicated SQL Pool. However, the truth of source is ADLSGen2 so how we keep both ADLSGen2 and Azure Synapse Dedicated SQL Pool in a sync?

AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator

2023-11-28T06:29:50.45+00:00

Mayank Patel ,

Just following up to see if the below answer helped. Please do consider clicking Accept Answer as accepted answers help community as well. Also, please click on Yes for the survey 'Was the answer helpful'

Answer accepted by question author

1 additional answer

Your answer

AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator

2023-11-28T06:29:50.45+00:00

Mayank Patel ,

Just following up to see if the below answer helped. Please do consider clicking Accept Answer as accepted answers help community as well. Also, please click on Yes for the survey 'Was the answer helpful'

Answer 1

AnnuKumari-MSFT 34,566 Microsoft Employee Moderator

Hi Mayank Patel ,

Welcome to Microsoft Q& A platform and thanks for posting your query here. As per my understanding you are looking for a way to load the data present in parquet file in ADLS gen2 to Azure synapse dedicated SQL pool , and the data would be updated from the external application which needs to flow till dedicated SQL pool.

Could you please share more details about the external application? Which data store is external application pointing to? Does the data land from the ext application land on ADLS and the same needs to be copied to ded SQL pool?

In case you are treating ADLS gen2 as the staging layer for copying the data from external application to dedicated sql pool, then you can consider creating synapse pipelines to do the same.

You can watchout the below videos to go through the entire workflow on how to create end-to-end pipeline for full load and incremental load while treating ADLS gen2 as staging layer to load the data in parquet format and then load to ded sql pool. You can customize the solution as per the requirement by making necessary changes:

How to do full load from On Premise SQL Server till ADLS using Azure Synapse Pipelines

How to load latest and greatest data from ADLS to Dedicated SQL Pool using Synapse Pipelines

How to perform incremental load from OnPremise SQL server to Dedicated Sql pool

How to perform Upsert for Incremental records using Azure Synapse Pipelines

Hope it helps. Please let us know if you have any further queries, kindly share more details on the ext application. Thankyou

Mayank Patel 20 Reputation points

2023-11-01T20:45:27.5766667+00:00

Hi @AnnuKumari-MSFT Thanks for your response. An external application is analytics platform connecting to Azure synapse dedicated SQL pool using ODBC Driver for READ/WRITE operations. READ Operation is ok as we load the present data from ADLSGen2 but during WRITE back operation it only updates the Azure synapse dedicated SQL pool not ADLSGen2. Looking for a way to keep both Azure synapse dedicated SQL pool and ADLSGen2 in Sync. Also, an external application doesn't support ADLSGen2 connection so only way to interact is using Azure synapse dedicated SQL pool using ODBC Driver.
AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator

2023-11-09T09:36:43.6366667+00:00

Mayank Patel ,

Apologies for delay in response. I understand that you want to keep the data in Azure Synapse Dedicated SQL Pool and Azure Data Lake Storage Gen2 in sync where odbc as an external application is involved.

There are two ways to achieve this , either create a data pipeline as the connectors are available for all the datastores that are involved in your requirement- ODBC , dedicated sql pool as well as ADLS . You can create a pipeline to first load the data to ADLS and then to dedicated sql pool as described in the above videos.

Other way is to create notebook and write the code to Connect to AzureSynapseAnalytics from Python using ODBC Driver importing pyodbc module. Below are the helpful resources:

https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/connect-overview#odbc-connection-string-example

https://docs.devart.com/odbc/sqlsynapse/python.htm

Hope it helps . Kindly accept the answer by clicking on Accept answer button. Thankyou
AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator

2023-11-22T08:51:09.9933333+00:00

Mayank Patel ,

In case the above answer helped, kindly consider accepting it by clicking on Accept answer button. Thankyou

Answer 2

To keep Azure Data Lake Storage Gen2 (ADLSGen2) and Azure Synapse Dedicated SQL Pool in sync, you can use Azure Data Factory (ADF) to copy data from ADLSGen2 to SQL Pool and vice versa. You can create a pipeline in ADF that triggers when there is a change in data in either ADLSGen2 or SQL Pool. The pipeline can copy the changed data to the other location.

Alternatively, you can use Azure Synapse Analytics to create an external table in SQL Pool that references the data in ADLSGen2. This allows you to query the data in ADLSGen2 from SQL Pool without copying it. When there is a change in data in ADLSGen2, the external table in SQL Pool will reflect the change.

It is important to note that if you are updating data in SQL Pool directly, you will need to ensure that the changes are also made in ADLSGen2. This can be done using the same pipeline in ADF or by manually updating the data in ADLSGen2.

References:

Share via

Insert/Update data in Azure Synapse Dedicated SQL Pool and keep data sync in ADLS Gen2.

1 additional answer

Your answer