Getting additional records when filter applied in Azure search

Mukesh Kumar 0 Reputation points
2023-09-13T11:25:59.7966667+00:00

Hi Team,

I am getting additional records when applying filter in Azure Cognitive Search. Please find the details below

Data source json file

[
    {
        "Emissions": "US EPA (Certified) Stationary Emergency|US EPA (Certified) Stationary Non-Emergency",
        "AzureSearch_DocumentKey": "aHR0cHM6Ly9waW1jb3Ntb3NmZWVkdGVzdC5ibG9iLmNvcmUud2luZG93cy5uZXQvYmxvb21yZWFjaC9pbmR1c3RyaWFsL0luZHVzdHJpYWwuanNvbjszNA2",
        "ProductProductNo": "250REZXB_60_DD"
    },
    {
        "Emissions": "US EPA (Certified) Stationary Emergency|US EPA (Certified) Stationary Non-Emergency",
        "AzureSearch_DocumentKey": "aHR0cHM6Ly9waW1jb3Ntb3NmZWVkdGVzdC5ibG9iLmNvcmUud2luZG93cy5uZXQvYmxvb21yZWFjaC9pbmR1c3RyaWFsL0luZHVzdHJpYWwuanNvbjszNA2",
        "ProductProductNo": "250REZXB_60_DF"
    },
    {
        "Emissions": "US EPA (Certified) Stationary and Mobile Emergency|US EPA (Certified) Stationary and Mobile Non-Emergency",
        "AzureSearch_DocumentKey": "aHR0cHM6Ly9waW1jb3Ntb3NmZWVkdGVzdC5ibG9iLmNvcmUud2luZG93cy5uZXQvYmxvb21yZWFjaC9pbmR1c3RyaWFsL0luZHVzdHJpYWwuanNvbjszNA2",
        "ProductProductNo": "250REZXB_60_EE"
    }
]

Request Body

{
    "count": true
    ,"top": 3000
    ,"skip": 0
    ,"search": "*"
    ,"orderby": ""
    ,"facets":["Emissions,count:100,sort:-count"]
    ,"filter": "search.ismatch('US EPA (Certified) Stationary Non-Emergency', 'Emissions', 'full','all')"
}

Response

{
    "@odata.context": "",
    "@odata.count": 86,
    "@search.facets": {
        "Emissions": [
            {
                "count": 3,
                "value": "US EPA (Certified) Stationary Non-Emergency, Prime, and Continuous"
            },
            {
                "count": 9,
                "value": "US EPA (Certified) Stationary and Mobile Emergency|US EPA (Certified) Stationary and Mobile Non-Emergency"
            },
            {
                "count": 10,
                "value": "US EPA (Certified) Stationary Emergency|US EPA (Certified) Stationary Non-Emergency"
            }
        ]
    },
"value":[]
}

From above response facet contains in first and third records but not sure why

"US EPA (Certified) Stationary and Mobile Emergency|US EPA (Certified) Stationary and Mobile Non-Emergency"

is returned in response. i have added analyzer as below still not able to get exact record

"analyzers": [
        {
            "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
            "name": "standardCmAnalyzer",
            "tokenizer": "standard_v2",
            "tokenFilters": [
                "lowercase",
                "asciifolding"
            ],
            "charFilters": []
        },
        {
            "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
            "name": "prefixCmAnalyzer",
            "tokenizer": "EdgeNGramTokenizer",
            "tokenFilters": [
                "lowercase",
                "asciifolding",
                "edgeNGramCmTokenFilter"
            ],
            "charFilters": []
        }
    ],
    "tokenizers": [
        {
            "@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenizer",
            "name": "EdgeNGramTokenizer",
            "minGram": 9,
            "maxGram": 40
        }
    ],
    "tokenFilters": [
        {
            "@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
            "name": "edgeNGramCmTokenFilter",
            "minGram": 9,
            "maxGram": 40,
            "side": "front"
        }
    ],

Can you please help me om this

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,339 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Deepanshukatara-6769 16,565 Reputation points Moderator
    2023-09-13T12:20:23.9066667+00:00

    Hi , hope doing good!

    Based on your provided information, it seems to be related to how the data is indexed and how the filter is applied. Here are some suggestions to help you resolve the issue:

    1. Analyzer Configuration: It's important to ensure that your analyzer configuration aligns with your search requirements. In your case, you're using standardCmAnalyzer and prefixCmAnalyzer, which involve tokenization, lowercase conversion, and ASCII folding. Make sure these analyzers are applied consistently during indexing and querying.
    2. Index Field Configuration: Verify that the "Emissions" field in your index is using the correct analyzer. Ensure that it uses the same analyzer (e.g., standardCmAnalyzer or prefixCmAnalyzer) for indexing and searching to achieve the desired results. If you've recently updated your analyzer, reindexing the data might be necessary.
    3. Search Query: Your filter query seems to be searching for an exact match using search.ismatch. If you want an exact match, you might want to use search.ismatch without the 'full' and 'all' options.

    Example filter query:

    
    "filter": "search.ismatch('US EPA (Certified) Stationary Non-Emergency', 'Emissions')"
    

    Make sure the text you're searching for ('US EPA (Certified) Stationary Non-Emergency') exactly matches the indexed data.

    4.Check for Special Characters: Sometimes, special characters or encoding issues can affect search results. Ensure that there are no hidden characters or encoding differences between your filter query and the indexed data.

    5.Reindex Data: If you've made changes to your analyzer configuration or suspect data indexing issues, consider reindexing your data to ensure the new analyzer settings are applied correctly.

    Please accept answer , if it helps , Thankyou!|


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.