Language Service is not returning correct results for plural words

Question

If I queried Language Service with 'determining budgetary units' and 'determining budgetary unit' for both input Language Service is responding with different results

If I queried the Language Service with 'determining budgetary units' it returned enter image description here with confidence score 0.53 that is 53% If I queried the Language Service with 'determining budgetary unit' it returned enter image description here with confidence score 0.71 that is 71% Ideally for 'determining budgetary units' and 'determining budgetary unit' it should return same output How to handle above issue in Language Service

Answer

Some language services, including Natural Language Processing (NLP) tools, treat singular and plural forms as different words, hence the different results.

To get around this issue, you could preprocess your queries to reduce all words to their base or root form, a process known as "lemmatization".

In Python, you can use libraries like NLTK (Natural Language Toolkit) or Spacy for lemmatization.

from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

lemmatizer = WordNetLemmatizer()

def lemmatize_query(query):
    word_list = word_tokenize(query)
    lemmatized_output = ' '.join([lemmatizer.lemmatize(w) for w in word_list])
    return lemmatized_output

query = "determining budgetary units"
lemmatized_query = lemmatize_query(query)
# now use lemmatized_query with your language service

Language Service is not returning correct results for plural words

1 answer