Azure Data Factory Fuzzy logic implementation for multiple tables in the SQL DB

MG 40 Reputation points
2024-03-19T03:46:26.49+00:00

I am trying to build a solution using Azure Data Factory to implement the Fuzzy Logic for two data source which is stored in Synapse SQL DB. for i.e. Table 1 and Table 2 which has identical structures.

I am trying to use ADF to build a pipelines using Data flows to connect to those two tables in the same SQL DB and has one link service created with two parameters which are:

  1. Table name
  2. Schema name

step - 01

Calling Data flow from pipeline task:

User's image

the Data flow task:User's image

Connection to the Synapse SQL DB where source 1 and source 2 are stored:

User's image

the fuzzy logic screen:

User's image

Issues:

  1. How to pass the values such as Table 1 and schema 1 and table 2 and schema 2 to the connection to source 1 and source 2
  2. The how to connect to the source tables and show the values in join conditions to build and join

Thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,549 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 30,751 Reputation points Microsoft Employee
    2024-03-19T07:45:31.48+00:00

    Hi MG ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your query here.

    As per my understanding , you are trying to use join transformation on two tables present in synapse SQLDB, both the sources are connected to same linked service and same dataset which is also parameterized. However, your query is how to pass values for these parameters i.e. tablename and schemaname. Please let me know if that is not the query.

    First of all, there is no need to parameterize linked service as both the tables are in same synapse sql pool/db.

    Coming to dataset , it's not feasible to use same dataset for both the sources, even if you parameterize it, and create parameters in the dataflow level and pass the values from the pipeline calling the dataflow as it will pass same value to both the sources at a time.

    Instead, you need to create two datasets, and directly point each datasets to each of the tables and use join transformation to connect source1 and source2 and perform fuzzy logic.

    Hope it helps. Kindly let me know if you have any further query. Thankyou

    0 comments No comments