Insert/Update data in Azure Synapse Dedicated SQL Pool and keep data sync in ADLS Gen2.

Mayank Patel 20 Reputation points
2023-11-01T07:17:09.1333333+00:00

We have Azure Synapse Dedicated SQL Pool and load Parquet files from ADLSGen2 to Azure Synapse Dedicated SQL Pool.

Now, we have a use case where an external application would like to update/insert/delete data in Azure Synapse Dedicated SQL Pool. However, the truth of source is ADLSGen2 so how we keep both ADLSGen2 and Azure Synapse Dedicated SQL Pool in a sync?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
{count} votes

Answer accepted by question author
  1. AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator
    2023-11-01T09:04:51.22+00:00

    Hi Mayank Patel ,

    Welcome to Microsoft Q& A platform and thanks for posting your query here. As per my understanding you are looking for a way to load the data present in parquet file in ADLS gen2 to Azure synapse dedicated SQL pool , and the data would be updated from the external application which needs to flow till dedicated SQL pool.

    Could you please share more details about the external application? Which data store is external application pointing to? Does the data land from the ext application land on ADLS and the same needs to be copied to ded SQL pool?

    In case you are treating ADLS gen2 as the staging layer for copying the data from external application to dedicated sql pool, then you can consider creating synapse pipelines to do the same.

    You can watchout the below videos to go through the entire workflow on how to create end-to-end pipeline for full load and incremental load while treating ADLS gen2 as staging layer to load the data in parquet format and then load to ded sql pool. You can customize the solution as per the requirement by making necessary changes:

    How to do full load from On Premise SQL Server till ADLS using Azure Synapse Pipelines

    How to load latest and greatest data from ADLS to Dedicated SQL Pool using Synapse Pipelines

    How to perform incremental load from OnPremise SQL server to Dedicated Sql pool

    How to perform Upsert for Incremental records using Azure Synapse Pipelines

    Hope it helps. Please let us know if you have any further queries, kindly share more details on the ext application. Thankyou


1 additional answer

Sort by: Most helpful
  1. Q&A Assist
    2023-11-01T07:17:09.1866667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    To keep Azure Data Lake Storage Gen2 (ADLSGen2) and Azure Synapse Dedicated SQL Pool in sync, you can use Azure Data Factory (ADF) to copy data from ADLSGen2 to SQL Pool and vice versa. You can create a pipeline in ADF that triggers when there is a change in data in either ADLSGen2 or SQL Pool. The pipeline can copy the changed data to the other location.

    Alternatively, you can use Azure Synapse Analytics to create an external table in SQL Pool that references the data in ADLSGen2. This allows you to query the data in ADLSGen2 from SQL Pool without copying it. When there is a change in data in ADLSGen2, the external table in SQL Pool will reflect the change.

    It is important to note that if you are updating data in SQL Pool directly, you will need to ensure that the changes are also made in ADLSGen2. This can be done using the same pipeline in ADF or by manually updating the data in ADLSGen2.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.