How to configure fuzzy matching using Azure Cognitive Search with full Lucene Syntax to cater for errors in the middle of strings?

Question

How to configure fuzzy matching using Azure Cognitive Search with full Lucene Syntax to cater for errors in the middle of strings?

Nick Petzold 5

I'm trying to build a query that is able to return back fuzzy matches from an index which I will simplify as the below:

{
  "@odata.context": "",
  "@odata.etag": "",
  "name": "index7",
  "defaultScoringProfile": null,
  "fields": [
    {
      "name": "JurisdictionCode",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": null,
      "vectorSearchConfiguration": null,
      "synonymMaps": []
    },
    {
      "name": "Aliases",
      "type": "Collection(Edm.ComplexType)",
      "fields": [
        {
          "name": "OriginalName",
          "type": "Edm.String",
          "searchable": false,
          "filterable": false,
          "retrievable": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "normalizer": null,
          "dimensions": null,
          "vectorSearchConfiguration": null,
          "synonymMaps": []
        },
        {
          "name": "NormalName",
          "type": "Edm.String",
          "searchable": true,
          "filterable": false,
          "retrievable": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "normal_name_analyzer",
          "normalizer": null,
          "dimensions": null,
          "vectorSearchConfiguration": null,
          "synonymMaps": []
        }
      ]
    }
  ],
  "scoringProfiles": [],
  "corsOptions": null,
  "suggesters": [],
  "analyzers": [
    {
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "name": "normal_name_analyzer",
      "tokenizer": "normal_name_tokenizer",
      "tokenFilters": [
        "lowercase"
      ],
      "charFilters": []
    }
  ],
  "normalizers": [],
  "tokenizers": [
    {
      "@odata.type": "#Microsoft.Azure.Search.NGramTokenizer",
      "name": "normal_name_tokenizer",
      "minGram": 3,
      "maxGram": 3,
      "tokenChars": []
    }
  ],
  "tokenFilters": [],
  "charFilters": [],
  "encryptionKey": null,
  "similarity": {
    "@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
    "k1": null,
    "b": null
  },
  "semantic": null,
  "vectorSearch": null
}

I am querying the index via the Python SDK using the current query shape and options:

Query: (Aliases/NormalName:newyorkknicks AND JurisdictionCode:"us_ny")

Options: {"query_type": "full", "search_mode": "all", "top": 20}

This query succeeds in returning the correct record newyorkknicks.

If I then update the query by removing elements either from the start, end or both, the correct record is still identified e.g.

Start removed: (Aliases/NormalName:wyorkknicks AND JurisdictionCode:"us_ny")

End removed: (Aliases/NormalName:newyorkknic AND JurisdictionCode:"us_ny")

Both removed: (Aliases/NormalName:wyorkknic AND JurisdictionCode:"us_ny")

However, whenever an internal element of the string is removed, no records are returned e.g.

(Aliases/NormalName:newyrkknicks AND JurisdictionCode:"us_ny")

Just by looking at the strings, newyrkknicks has a much higher NGram similarity than wyorkknic, so I can't see why the latter is able to return a match while the former cannot. It seems like there might be some sort of edge NGram similarity at play here, but I haven't configured it like that (at least I don't think I have!).

Does anyone have any suggestions as to what I'm doing wrong here?

ajkuma 28,036 Reputation points Microsoft Employee Moderator

2023-09-13T13:11:52.44+00:00

Thanks for posting this question. I'm checking on this and will get back to you shortly.
ajkuma 28,036 Reputation points Microsoft Employee Moderator

2023-09-14T13:18:33.3633333+00:00

For a deeper investigation, we would need additional info about your subscription and Azure resource (to check internal logs), if you have a support plan, you may raise a support ticket, else please let me know, I'll follow-up further offline.

Note: Please do not share any PII data on the public forums.

Your answer

ajkuma 28,036 Reputation points Microsoft Employee Moderator

2023-09-13T13:11:52.44+00:00

Thanks for posting this question. I'm checking on this and will get back to you shortly.
ajkuma 28,036 Reputation points Microsoft Employee Moderator

2023-09-14T13:18:33.3633333+00:00

For a deeper investigation, we would need additional info about your subscription and Azure resource (to check internal logs), if you have a support plan, you may raise a support ticket, else please let me know, I'll follow-up further offline.

Note: Please do not share any PII data on the public forums.

Share via

How to configure fuzzy matching using Azure Cognitive Search with full Lucene Syntax to cater for errors in the middle of strings?

Your answer