Language Service is not returning correct results for plural words

Rhuddhi Gawas | MAQ Software 6 Reputation points
2023-05-10T12:04:19.32+00:00

If I queried Language Service with 'determining budgetary units' and 'determining budgetary unit' for both input Language Service is responding with different results

If I queried the Language Service with 'determining budgetary units' it returned enter image description here with confidence score 0.53 that is 53% If I queried the Language Service with 'determining budgetary unit' it returned enter image description here with confidence score 0.71 that is 71% Ideally for 'determining budgetary units' and 'determining budgetary unit' it should return same output How to handle above issue in Language Service

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
358 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Sedat SALMAN 13,160 Reputation points
    2023-05-13T15:23:25.76+00:00

    Some language services, including Natural Language Processing (NLP) tools, treat singular and plural forms as different words, hence the different results.

    To get around this issue, you could preprocess your queries to reduce all words to their base or root form, a process known as "lemmatization".

    In Python, you can use libraries like NLTK (Natural Language Toolkit) or Spacy for lemmatization.

    from nltk.stem import WordNetLemmatizer
    from nltk.tokenize import word_tokenize
    
    lemmatizer = WordNetLemmatizer()
    
    def lemmatize_query(query):
        word_list = word_tokenize(query)
        lemmatized_output = ' '.join([lemmatizer.lemmatize(w) for w in word_list])
        return lemmatized_output
    
    query = "determining budgetary units"
    lemmatized_query = lemmatize_query(query)
    # now use lemmatized_query with your language service
    
    
    0 comments No comments