Azure Data Factory / Synapse Pipeline / Data Lake / Relational Database / Incremental Load based on Filtered data set

Kieran Wood 101 Reputation points
2023-10-21T11:28:19.43+00:00

Hi,
How would you design a Synapse Pipeline to meet the following requirements …

1)     Load the attached Customer Name and address data set from a data lake which is in the attached file format.

2)     Filter the Customer the city Sheffield

3)     Load the resulting data set incrementally into a relational database.

Many thanks,

Kieran

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,372 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,477 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,765 questions
{count} votes

Accepted answer
  1. Amira Bedhiafi 16,306 Reputation points
    2023-10-21T20:04:04.9366667+00:00

    Since you didn't provide enough details about your environment, I am assuming the following :

    You need to start by setting up a 'Copy data' activity to import the dataset from ADLS, specifying the necessary file format. Here is a guide : https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-data-lake-storage?tabs=data-factory

    Follow this with a 'Data Flow' activity, applying a 'Filter' transformation to retain records where the city is 'Sheffield'.

    https://www.youtube.com/watch?v=OhbKDOXSfeE

    For the incremental load, you need to configure the destination in the Azure SQL Database as the sink, enabling the 'Auto Incremental Load' feature and setting a watermark column for identifying new or updated records.

    Try to execute the pipeline either on a schedule or on-demand and monitor the execution to troubleshoot and optimize as necessary.

    2 people found this answer helpful.

0 additional answers

Sort by: Most helpful