How to search words in Azure search and ignoring dashes and/or spaces

Serge Lamarre 0 Reputation points
2024-02-15T19:56:43.54+00:00

Hi, Instead of explaining what I want to do, here are some scenarios that I am trying to get to I have a table storage with a column having those values: 7 UP Products Sea Pak Shrimp & Seafood Co. Shrimp Pop Corners Popped-Corn Snack Skinny Dipped Almonds La Croix Sparkling Water I would like those rows to return from Azure Search when I search on : 7UP Product should return 7 UP Product SeaPak Shrimp & Seafood Co. Shrimp should return Sea Pak Shrimp & Seafood Co. Shrimp PopCorners Popped-Corn Snack should return Pop Corners Popped-Corn Snack SkinnyDipped Almonds should return Skinny Dipped Almonds La Croix Sparkling Water should return LaCroix Sparkling Water

Thank you.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,353 questions
0 comments No comments
{count} votes

4 answers

Sort by: Most helpful
  1. brtrach-MSFT 17,741 Reputation points Microsoft Employee Moderator
    2024-02-16T00:07:40.5866667+00:00

    @Serge Lamarre To search words in Azure Search and ignore dashes and/or spaces, you can use a custom analyzer that preserves special characters and spaces. This will allow you to search for terms that include special characters and spaces, and return results that match the original string.

    Here's an example of how you can create a custom analyzer in Azure Search:

    1. Define a custom analyzer that uses the standard tokenizer and the asciifolding and lowercase token filters. The asciifolding filter removes accents and diacritics from characters, while the lowercase filter converts all characters to lowercase.
       { "name": "customanalyzer", "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer", "tokenizer": "standard", "tokenFilters": [ "asciifolding", "lowercase" ] }
       
    
    1. Add the custom analyzer to the field definition for the column you want to search. In this example, we'll use the text field.
       { "name": "text", "type": "Edm.String", "searchable": true, "analyzer": "customanalyzer" }
       
    

    With this custom analyzer, you can search for terms that include special characters and spaces, and return results that match the original string. For example, if you search for "7UP Product", you'll get results that include "7 UP Products". Similarly, if you search for "SeaPak Shrimp & Seafood Co. Shrimp", you'll get results that include "Sea Pak Shrimp & Seafood Co. Shrimp".


  2. Serge Lamarre 0 Reputation points
    2024-02-16T15:50:55.2066667+00:00

    Hi,

    I tried what you outlined but it was not successful. I created an index with the definition further below. Upon sending post with the ?api-version=2023-11-01, it was complaining about the normalizers field that where not in the api schema version 2023-11-01. I removed all of those and then I got an error that the standard tokenizer does not exist with api version 2023-11-01 so I replaced it with stadard_v2 and then the POST operation was successful and the index was created. Then I created an Indexer and ran it until successful completion.

    Then I tried to search the strings outlined in my original post and nothing is getting returned...

    What I did wrong?

    Query JSON

    {
      "search": "7UP Products",
        "queryType": "full",
        "searchMode": "any",
        "select": "BrandOwner, BrandManufacturer, BrandHigh, BrandLow, BrandRevenue",
        "searchFields": "BrandHigh,BrandLow",
        "filter": "BrandRevenue gt 10000",
        "top": 10,
        "scoringStatistics": "local"
    }
    

    Index Definition

    {
      "name": "brands-test-index",
      "defaultScoringProfile": "",
      "fields": [
        {
          "name": "PartitionKey",
          "type": "Edm.String",
          "searchable": false,
          "filterable": false,
          "retrievable": false,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        },
        {
          "name": "RowKey",
          "type": "Edm.String",
          "searchable": false,
          "filterable": false,
          "retrievable": false,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        },
        {
          "name": "ETag",
          "type": "Edm.String",
          "searchable": false,
          "filterable": false,
          "retrievable": false,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        },
        {
          "name": "Timestamp",
          "type": "Edm.DateTimeOffset",
          "searchable": false,
          "filterable": false,
          "retrievable": false,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        },
        {
          "name": "Key",
          "type": "Edm.String",
          "searchable": false,
          "filterable": false,
          "retrievable": true,
          "sortable": false,
          "facetable": false,
          "key": true,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        },
        {
          "name": "BrandHigh",
          "type": "Edm.String",
          "searchable": true,
          "filterable": false,
          "retrievable": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "customanalyzer",
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        },
        {
          "name": "BrandLow",
          "type": "Edm.String",
          "searchable": true,
          "filterable": false,
          "retrievable": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "customanalyzer",
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        },
        {
          "name": "BrandManufacturer",
          "type": "Edm.String",
          "searchable": true,
          "filterable": false,
          "retrievable": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "standard.lucene",
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        },
        {
          "name": "BrandOwner",
          "type": "Edm.String",
          "searchable": true,
          "filterable": false,
          "retrievable": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "standard.lucene",
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        },
        {
          "name": "BrandRevenue",
          "type": "Edm.Double",
          "searchable": false,
          "filterable": true,
          "retrievable": true,
          "sortable": true,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        }
      ],
      "scoringProfiles": [],
      "corsOptions": null,
      "suggesters": [],
      "analyzers": [ 
          { "name": "customanalyzer", 
            "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer", 
            "tokenizer": "standard_v2", 
            "tokenFilters": [ "asciifolding", "lowercase" ] 
          } 
      ],
      "tokenizers": [],
      "tokenFilters": [],
      "charFilters": [],
      "encryptionKey": null,
      "similarity": {
        "@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
        "k1": null,
        "b": null
      },
      "semantic": null,
      "vectorSearch": null
    }
    
    0 comments No comments

  3. brtrach-MSFT 17,741 Reputation points Microsoft Employee Moderator
    2024-02-17T05:41:17.8566667+00:00

    @Serge Lamarre I'm sorry to hear that the previous solution did not work for you. It seems that the issue you are facing is related to the API version you are using. The "standard" tokenizer is not supported in the API version '2023-10-01-Preview', which is why you received an error message when you tried to create the index. To solve this issue, you can try using the "standard_v2" tokenizer instead of the "standard" tokenizer. This tokenizer is supported in the API version '2023-11-01', which is the latest version at the time of writing this message. You can update your index definition to use the "standard_v2" tokenizer as follows:

    {
      "name": "brands-test-index",
      "defaultScoringProfile": "",
      "fields": [
        {
          "name": "PartitionKey",
          "type": "Edm.String",
          "searchable": false,
          "filterable": false,
          "retrievable": false,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        },
        {
          "name": "RowKey",
          "type": "Edm.String",
          "searchable": false,
          "filterable": false,
          "retrievable": false,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        },
        {
          "name": "ETag",
          "type": "Edm.String",
          "searchable": false,
          "filterable": false,
          "retrievable": false,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "synonymMaps": []
        },
        {
          "name": "Timestamp",
          "type": "Edm.DateTimeOffset",
          "
    
    

  4. Serge Lamarre 0 Reputation points
    2024-02-17T20:56:30.16+00:00

    Hi, The Index JSON definition that you tried to paste was truncated so it is not possible for me to see what you meant. Can you please add the appropriate entries about what you suggested. Also, I did user the standard_v2 as mentioned in my first reply. Can you look at my JSON definition that I pasted earlier if it make sense? Thank you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.