Handling Search Queries in Azure AI Search

Aakash Bhaikatti 20 Reputation points
2024-06-10T08:46:23.2366667+00:00

Hello,

I'm working on a project where we've implemented a document processing RAG like pipeline using Azure AI Search service. Here's a brief overview of our setup:

  1. We extract text from a PDF and store it in a .txt file inside Azure Blob Storage.
  2. Using Azure Blob Storage, we upload the document to an Azure AI Search index.
  3. We use an OpenAI prompt to generate 20 product keywords based on the content of the file, which are then stored in the index.
  4. We are employing a simple hybrid search to retrieve relevant answers from the indexed document.

The system works well for most queries. However, we encounter issues when users submit queries without spaces, hyphens, respecting case sensitivity, special characters, or containing alphanumeric combinations. These types of queries are not returning the expected relevant results.

Our specific questions are:

  1. How can we improve the search capability to handle queries with special characters, no spaces, hyphens, mixed cases, and alphanumeric combinations?
  2. Are there specific settings or configurations in Azure AI Search that can help in normalizing such queries before processing?
  3. What are the best practices for pre-processing or transforming user queries to enhance the search accuracy in this context?

Thank you for your assistance!

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,339 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,081 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,602 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sina Salam 22,031 Reputation points Volunteer Moderator
    2024-06-10T13:57:46.2466667+00:00

    Hello Aakash Bhaikatti,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are working on a project and seeking assistance on how to handle Search Queries in Azure AI Search with those three questions.

    Solution

    I will provide the solution based on the scenario given and your questions.

    Q1:

    How can we improve the search capability to handle queries with special characters, no spaces, hyphens, mixed cases, and alphanumeric combinations?

    To improve the search capability to handle queries with special characters, no spaces, hyphens, mixed cases, and alphanumeric combinations in Azure AI Search, you can employ a combination of custom analyzers, query preprocessing, synonym maps, scoring profiles, and continuous monitoring.

    Flexible filtering, faceting, and sorting in Azure Cognitive Search. https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/flexible-filtering-faceting-and-sorting-in-azure-cognitive/ba-p/3038442.

    Q2:

    Are there specific settings or configurations in Azure AI Search that can help in normalizing such queries before processing?

    Yes, there are specific settings and configurations in Azure Cognitive Search that can help in normalizing queries before processing. These include custom analyzers, tokenizers, token filters, and synonym maps. Normalizers can achieve light text transformations that improve results such as consistent casing, normalize accents and diacritics to ASCII equivalent characters, and map characters like - and whitespace into a user-specified character.

    Partial terms, patterns, and special characters. https://learn.microsoft.com/en-us/azure/search/search-query-partial-matching.

    Q3:

    What are the best practices for pre-processing or transforming user queries to enhance the search accuracy in this context?

    The best practices can be applied for pre-processing or transforming user queries start from Normalize Case, and I provide you here with combined Python function implementing several of these best practices in comments as a title:

    import re
    def preprocess_query(query):
        # Normalize case
        query = query.lower()
        # Remove special characters
        query = re.sub(r'[^a-z0-9\s-]', '', query)
        # Handle hyphens and spaces
        query = query.replace('-', ' ')
        # Split alphanumeric combinations
        query = re.sub(r'(\d)([a-zA-Z])', r'\1 \2', query)
        query = re.sub(r'([a-zA-Z])(\d)', r'\1 \2', query)
        # Remove extra whitespaces
        query = ' '.join(query.split())
        # Expand query with synonyms (example dictionary)
        synonyms = {
            "product": ["item", "good"],
            "prod-id": ["product-id"],
            "productid": ["product id"]
        }
        expanded_query = query
        for term, synonym_list in synonyms.items():
            if term in query:
                expanded_query += " " + " ".join(synonym_list)
        return expanded_query
    # Example usage
    user_query = "ProductID-123ABC"
    normalized_query = preprocess_query(user_query)
    print(normalized_query)  # Output: "productid 123 abc product item good product id"
    

    However, in this link Azure AI Search Database Selection: Optimizing Performance. https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-search-database-selection-optimizing-performance-and/ba-p/4155601.

    You will be able to read more about best practices and steps to follow.

    References

    Source: Flexible filtering, faceting, and sorting in Azure Cognitive Search. Accessed, 6/10/2024.

    Source: Partial terms, patterns, and special characters. Accessed, 6/10/2024.

    Source: Azure AI Search Database Selection: Optimizing Performance. Accessed, 6/10/2024.

    Accept Answer

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    ** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

    Best Regards,

    Sina Salam

    1 person found this answer helpful.
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Aakash Bhaikatti 20 Reputation points
    2024-08-06T12:20:58.3833333+00:00

    Hi Sina,

    I hope you're doing well.

    I wanted to express my heartfelt thanks for your assistance with the problem statement. Your help was invaluable, and I truly appreciate the time and effort you put into providing such thoughtful guidance.

    I also wanted to apologize for my delayed response. I appreciate your patience and understanding.

    Thank you once again for your support!

    0 comments No comments

  2. Aakash Bhaikatti 20 Reputation points
    2024-08-06T12:22:12.5366667+00:00

    Also I've an similar problem statement, if you can look into it & help me out

    I'm facing an issue with Azure AI Search and need some guidance.

    Setup:

    Extracted text from a website containing only products.

      Stored the extracted content in JSON format on Azure AI Search Index:
      
      ```json
      [
    

    { "url": "https://abc.com/groups/products/product1", "content": { "text": "website content", "tables": [only if there are table contents] } } ] ```

      **API Implementation:**
      
         Developed an API function `web_search()` for querying the Azure AI Search Index.
         
            Combined citations from unstructured PDF data (which returns the answer, 3 documents, and product keywords) and the `web_search()` (which returns URL and keywords if matched).
            
            **Problem:**
            
               The index fails to return the correct URL and keywords every time a question is asked.
               
                  The unstructured data API returns the correct results consistently.
                  
    

    Question: How can I solve this issue to ensure the Azure AI Search Index returns the correct URL and keywords every time a question is asked? This functionality was working correctly earlier but has recently started failing.

    Any help or suggestions would be greatly appreciated!

    Thank you.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.