How to Implement Fuzzy Matching in Azure Data Factory (ADF) with Similarity and Confidence Scoring Like SSIS Fuzzy Lookup?

Question

How to Implement Fuzzy Matching in Azure Data Factory (ADF) with Similarity and Confidence Scoring Like SSIS Fuzzy Lookup?

Priyanka Rani 0

Hi everyone,

In SQL Server Integration Services (SSIS), the Fuzzy Lookup transformation allows us to match input records against a reference table using fuzzy matching logic. It also automatically generates Similarity and Confidence columns for each matched record.

I’m currently in the process of migrating a data pipeline from SSIS to Azure Data Factory (ADF) and am trying to replicate this functionality.

Key objectives:

Perform fuzzy matching between two datasets (e.g., customer names).

Compute a similarity score or confidence level similar to what SSIS provides.

Retain the best match per input record, if applicable.

Questions:

What’s the recommended way to implement fuzzy matching in ADF?

Is there a built-in feature or transformation in ADF that replicates SSIS’s Fuzzy Lookup behavior?

How can we compute similarity or confidence scores in ADF? I’ve tried using algorithms like Levenshtein and Jaccard, but the results didn’t match the output from SSIS Fuzzy Lookup.

Are there any proven workarounds, such as using Azure Functions, Databricks, or custom activities, to achieve this in a scalable and maintainable way?

Any guidance, best practices, or examples would be greatly appreciated!

Thanks in advance.

Nandan Hegde 36,886 Reputation points MVP Volunteer Moderator

2025-07-08T13:55:57.9133333+00:00

There is Fuzzy matching at join level within ADF dataflow :

https://gbamezai.medium.com/fuzzy-string-matching-in-azure-data-factory-11f5e3de1618
Priyanka Rani 0 Reputation points

2025-07-09T08:03:58.11+00:00

Hi Nandan,

Thanks for taking the time to look into this.

I've already tried this approach, but I'm specifically looking for an exact match on the Similarity and Confidence column values that are generated by the Fuzzy Lookup Transformation.

1 answer

Your answer

Nandan Hegde 36,886 Reputation points MVP Volunteer Moderator

2025-07-08T13:55:57.9133333+00:00

There is Fuzzy matching at join level within ADF dataflow :

https://gbamezai.medium.com/fuzzy-string-matching-in-azure-data-factory-11f5e3de1618
Priyanka Rani 0 Reputation points

2025-07-09T08:03:58.11+00:00

Hi Nandan,

Thanks for taking the time to look into this.

I've already tried this approach, but I'm specifically looking for an exact match on the Similarity and Confidence column values that are generated by the Fuzzy Lookup Transformation.

Answer 1

Hi Priyanka Rani You’re absolutely right this is a common scenario when migrating SSIS workloads to Azure Data Factory (ADF). Unfortunately, ADF does not have a built-in transformation that fully replicates SSIS’s Fuzzy Lookup behavior especially the automatic generation of Similarity and Confidence scores for each matched record.

ADF Data Flows do offer some fuzzy matching capabilities through:

The Join Transformation with approximate string matching logic.
Derived column expressions using custom similarity functions (like Levenshtein or Jaccard).

Recommended Workarounds

Here are the most common alternatives to achieve reliable fuzzy matching in ADF pipelines: Use Azure Databricks or Synapse Spark: This is the most flexible and scalable option for complex fuzzy matching. You can leverage Spark libraries like FuzzyWuzzy or write custom Python/Scala UDFs to compute Similarity and Confidence scores. The results can be written back to your Data Lake or SQL database for downstream use.
Call an Azure Function or Custom Activity: For smaller workloads, you can build a custom Azure Function (Python or C#) with your fuzzy logic. Trigger it from your ADF pipeline to process data and return scores. This works well if you only need to match a moderate volume of records.
Precompute or Cache Reference Matches: If your reference dataset doesn’t change often, you can precompute possible matches and similarity scores. Store these results in a lookup table and join them during your ADF pipeline runs.

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Venkat Reddy Navari 5,840 Reputation points Microsoft External Staff Moderator

2025-07-11T18:45:22.5033333+00:00

Priyanka Rani Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

How to Implement Fuzzy Matching in Azure Data Factory (ADF) with Similarity and Confidence Scoring Like SSIS Fuzzy Lookup?

1 answer

Your answer