How to search words in Azure search and ignoring dashes and/or spaces

Question

How to search words in Azure search and ignoring dashes and/or spaces

Serge Lamarre 0

Hi, Instead of explaining what I want to do, here are some scenarios that I am trying to get to I have a table storage with a column having those values: 7 UP Products Sea Pak Shrimp & Seafood Co. Shrimp Pop Corners Popped-Corn Snack Skinny Dipped Almonds La Croix Sparkling Water I would like those rows to return from Azure Search when I search on : 7UP Product should return 7 UP Product SeaPak Shrimp & Seafood Co. Shrimp should return Sea Pak Shrimp & Seafood Co. Shrimp PopCorners Popped-Corn Snack should return Pop Corners Popped-Corn Snack SkinnyDipped Almonds should return Skinny Dipped Almonds La Croix Sparkling Water should return LaCroix Sparkling Water

Thank you.

4 answers

Your answer

Answer 1

brtrach-MSFT 17,741 Microsoft Employee Moderator

@Serge Lamarre To search words in Azure Search and ignore dashes and/or spaces, you can use a custom analyzer that preserves special characters and spaces. This will allow you to search for terms that include special characters and spaces, and return results that match the original string.

Here's an example of how you can create a custom analyzer in Azure Search:

Define a custom analyzer that uses the standard tokenizer and the asciifolding and lowercase token filters. The asciifolding filter removes accents and diacritics from characters, while the lowercase filter converts all characters to lowercase.

   { "name": "customanalyzer", "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer", "tokenizer": "standard", "tokenFilters": [ "asciifolding", "lowercase" ] }

Add the custom analyzer to the field definition for the column you want to search. In this example, we'll use the text field.

   { "name": "text", "type": "Edm.String", "searchable": true, "analyzer": "customanalyzer" }

With this custom analyzer, you can search for terms that include special characters and spaces, and return results that match the original string. For example, if you search for "7UP Product", you'll get results that include "7 UP Products". Similarly, if you search for "SeaPak Shrimp & Seafood Co. Shrimp", you'll get results that include "Sea Pak Shrimp & Seafood Co. Shrimp".

Serge Lamarre 0

Hi, I am trying to edit the index definition in the Index JSON Editor within the Azure Portal and I am getting this error message: The request is invalid. Details: definition : The tokenizer of type 'standard' is not supported in the API version '2023-10-01-Preview'.

How can I solve this? Here is the index definition below. The fields that need that analyzer is BrandHigh and BrandLow.

{
  "@odata.context": "https://eus2poc-ai-search-retailersaleshub.search.windows.net/$metadata#indexes/$entity",
  "@odata.etag": "\"0x8DC2CD4A366710B\"",
  "name": "brands-index",
  "defaultScoringProfile": "",
  "fields": [
    {
      "name": "PartitionKey",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "RowKey",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "ETag",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "Timestamp",
      "type": "Edm.DateTimeOffset",
      "searchable": false,
      "filterable": false,
      "retrievable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "Key",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": true,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "BrandHigh",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": "standard.lucene",
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "BrandLow",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": "standard.lucene",
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "BrandManufacturer",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": "standard.lucene",
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "BrandOwner",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": "standard.lucene",
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "BrandRevenue",
      "type": "Edm.Double",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    }
  ],
  "scoringProfiles": [],
  "corsOptions": null,
  "suggesters": [],
  "analyzers": [],
  "normalizers": [],
  "tokenizers": [],
  "tokenFilters": [],
  "charFilters": [],
  "encryptionKey": null,
  "similarity": {
    "@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
    "k1": null,
    "b": null
  },
  "semantic": null,
  "vectorSearch": null
}

Answer 2

Hi,

I tried what you outlined but it was not successful. I created an index with the definition further below. Upon sending post with the ?api-version=2023-11-01, it was complaining about the normalizers field that where not in the api schema version 2023-11-01. I removed all of those and then I got an error that the standard tokenizer does not exist with api version 2023-11-01 so I replaced it with stadard_v2 and then the POST operation was successful and the index was created. Then I created an Indexer and ran it until successful completion.

Then I tried to search the strings outlined in my original post and nothing is getting returned...

What I did wrong?

Query JSON

{
  "search": "7UP Products",
    "queryType": "full",
    "searchMode": "any",
    "select": "BrandOwner, BrandManufacturer, BrandHigh, BrandLow, BrandRevenue",
    "searchFields": "BrandHigh,BrandLow",
    "filter": "BrandRevenue gt 10000",
    "top": 10,
    "scoringStatistics": "local"
}

Index Definition

{
  "name": "brands-test-index",
  "defaultScoringProfile": "",
  "fields": [
    {
      "name": "PartitionKey",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "RowKey",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "ETag",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "Timestamp",
      "type": "Edm.DateTimeOffset",
      "searchable": false,
      "filterable": false,
      "retrievable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "Key",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": true,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "BrandHigh",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": "customanalyzer",
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "BrandLow",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": "customanalyzer",
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "BrandManufacturer",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": "standard.lucene",
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "BrandOwner",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": "standard.lucene",
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "BrandRevenue",
      "type": "Edm.Double",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    }
  ],
  "scoringProfiles": [],
  "corsOptions": null,
  "suggesters": [],
  "analyzers": [ 
      { "name": "customanalyzer", 
        "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer", 
        "tokenizer": "standard_v2", 
        "tokenFilters": [ "asciifolding", "lowercase" ] 
      } 
  ],
  "tokenizers": [],
  "tokenFilters": [],
  "charFilters": [],
  "encryptionKey": null,
  "similarity": {
    "@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
    "k1": null,
    "b": null
  },
  "semantic": null,
  "vectorSearch": null
}

Answer 3

@Serge Lamarre I'm sorry to hear that the previous solution did not work for you. It seems that the issue you are facing is related to the API version you are using. The "standard" tokenizer is not supported in the API version '2023-10-01-Preview', which is why you received an error message when you tried to create the index. To solve this issue, you can try using the "standard_v2" tokenizer instead of the "standard" tokenizer. This tokenizer is supported in the API version '2023-11-01', which is the latest version at the time of writing this message. You can update your index definition to use the "standard_v2" tokenizer as follows:

{
  "name": "brands-test-index",
  "defaultScoringProfile": "",
  "fields": [
    {
      "name": "PartitionKey",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "RowKey",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "ETag",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "synonymMaps": []
    },
    {
      "name": "Timestamp",
      "type": "Edm.DateTimeOffset",
      "

Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Answer 4

Serge Lamarre 0

Hi, The Index JSON definition that you tried to paste was truncated so it is not possible for me to see what you meant. Can you please add the appropriate entries about what you suggested. Also, I did user the standard_v2 as mentioned in my first reply. Can you look at my JSON definition that I pasted earlier if it make sense? Thank you.

brtrach-MSFT 17,741 Reputation points Microsoft Employee Moderator

2024-02-20T21:29:44.98+00:00

Please see the private message we sent you. It can be viewed by clicking the banner at the top of the page.

Share via

How to search words in Azure search and ignoring dashes and/or spaces

4 answers

Your answer