Azure SQL Server Fuzzy Logic Matching

Sam Oz 36 Reputation points
2021-06-16T02:40:00.857+00:00

I’m trying to do fuzzy logic address matching between two tables in Azure SQL Server (serverless). Often there are spelling errors in one table only, so I’m trying to find a way to match them.

For example.

Table 1 has an address like this:
106025-image.png

Table 2 has the same address, but with a spelling error in Hornsby, like this:
106014-image.png

I would like to match these two records and insert it into a new table. For example like this:
105960-image.png

Matching Score could be a score for how close a match they are.

I’ve looked into using the Levenshtein Distance algorithm to achieve this. The problem is Azure SQL Server (serverless) doesn’t support registering .NET Assemblies. But maybe there is a better way to do it anyway, like with Azure Data Factory. Any tips on the best Azure tools for this?

Azure SQL Database
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
0 comments No comments
{count} votes

4 answers

Sort by: Most helpful
  1. Anurag Sharma 17,631 Reputation points
    2021-06-16T08:11:58.787+00:00

    Hi @Sam Oz , welcome to Microsoft Q&A forum.

    You are right in mentioning that assemblies are not supported in Azure SQL paas. However there are ways to do it as below:

    1. Using Azur Functions: You can create azure functions where in we can read the data from Azure SQL database and write any algorithms.
      Use Azure Functions to connect to an Azure SQL Database
    2. Using Azure Data Factory: We can create ADF and integrate algorithm code on the same.
      Azure Data Factory libraries for .NET

    Based on pricing and requirement one of these can be selected.

    Please let us know if this helps.

    ----------

    if answer helps, please mark it 'Accept Answer'


  2. Anurag Sharma 17,631 Reputation points
    2021-06-17T09:16:19.263+00:00

    Hi @Sam Oz , thanks for your patience.

    Writing this as another answer as word limit is crossing 1000 characters.

    Based on your inputs, the joined dataset between these 2 tables will be huge. We can use Fuzzy looks in ADF which can perform the same and creating multiple parallel pipelines to make it run faster based on ADF compute, but again as the data is huge it could take considerable time to process.

    Another cost effective way to do it is to use SSIS package. If this is just one time activity, I suggest we can export these 2 tables and load them on on-premise SQL server and use the SSIS package which also has Fuzzy look up tasks. Multiple control flow tasks can be created in parallel and based on local system configuration these tasks can be completed. The result of it can be stored in another table which can be exported and loaded back to your Azure SQL Database. This would be cost effective as well as considerable faster as all your processing happening on local system. Below is one sample on using SSIS with fuzzy lookup.

    Fuzzy Lookup Transformation in SSIS

    Please let me know if this helps or you need more details on the same.

    ----------

    if answer helps, please mark it 'Accept Answer'


  3. Ian Santillan 197 Reputation points
    2021-07-23T19:22:50.777+00:00

    Please see this blog pot here from Mark Kromer.

    https://kromerbigdata.com/2019/04/21/use-adf-mapping-data-flows-for-fuzzy-matching-and-dedupe/

    Hope this helps!

    0 comments No comments

  4. Sam Oz 36 Reputation points
    2021-07-26T23:47:17.68+00:00

    Thank you @notyourninja for the link.
    Nothing further @AnuragSharma-MSFT

    Cheers

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.