An Azure service for ingesting, preparing, and transforming data at scale.
Hi Priyanka Rani You’re absolutely right this is a common scenario when migrating SSIS workloads to Azure Data Factory (ADF). Unfortunately, ADF does not have a built-in transformation that fully replicates SSIS’s Fuzzy Lookup behavior especially the automatic generation of Similarity and Confidence scores for each matched record.
ADF Data Flows do offer some fuzzy matching capabilities through:
- The Join Transformation with approximate string matching logic.
- Derived column expressions using custom similarity functions (like Levenshtein or Jaccard).
Recommended Workarounds
- Here are the most common alternatives to achieve reliable fuzzy matching in ADF pipelines: Use Azure Databricks or Synapse Spark: This is the most flexible and scalable option for complex fuzzy matching. You can leverage Spark libraries like
FuzzyWuzzyor write custom Python/Scala UDFs to compute Similarity and Confidence scores. The results can be written back to your Data Lake or SQL database for downstream use. - Call an Azure Function or Custom Activity: For smaller workloads, you can build a custom Azure Function (Python or C#) with your fuzzy logic. Trigger it from your ADF pipeline to process data and return scores. This works well if you only need to match a moderate volume of records.
- Precompute or Cache Reference Matches: If your reference dataset doesn’t change often, you can precompute possible matches and similarity scores. Store these results in a lookup table and join them during your ADF pipeline runs.
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.