Help with Semantic Search on Large Table Using Azure OpenAI and Search

Chander, Ajay (Cognizant) 0 Reputation points
2024-11-12T17:35:46.0533333+00:00

I'm working on a POC for a semantic search project and would appreciate some advice on how to approach it better.

 

Objective: I need to retrieve relevant results based on user queries from a wide table with over 50 columns, which includes more than 100,000 rows. My focus is on a key column, ‘description,’ containing long text, while the other columns are mostly integers, booleans, tiny integers, and dates.

 

Challenge: The user could ask questions that involve the semantic meaning of the content in the table (not just exact matches). Additionally, queries might involve 1-3 filter conditions across different columns, so the output should still yield the most relevant rows.

 

What I've Tried:

 

I'm using Azure OpenAI and Search resources.

 

I attempted concatenating about 10 important columns, embedding this combined text, and using it for vector search.

 

The issue: It works somewhat when the query relates closely to the ‘description’ column, but when filters apply to other columns, the results aren’t as relevant as I’d hoped.

 

 

Question: Has anyone tackled a similar use case? Any tips on refining this approach or other tools I should consider? Any advice is appreciated, especially since I’m new to this type of problem. Thanks!

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,339 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. brtrach-MSFT 17,731 Reputation points Microsoft Employee Moderator
    2024-11-13T01:33:16.2966667+00:00

    @Chander, Ajay (Cognizant) Here are some strategies and tips that might help you refine your approach:

    1. Chunking and Embedding
      1. Since you're dealing with a large table and long text in the 'description' column, consider chunking your text data into smaller, semantically meaningful segments. This can help maintain context and improve the quality of embeddings. You can use techniques like:
      2. Fixed-size chunks (e.g., 200-300 words) with some overlap to preserve context.
      3. Variable-sized chunks based on natural language structures (like sentences or paragraphs) to ensure meaningful segments.
    2. Enhanced Query Processing

    To improve relevance when applying filters:

    1. Combine Filters with Semantic Search: Use a two-step approach where you first retrieve relevant embeddings based on the query and then apply filters on the results. This can help maintain semantic relevance while still adhering to filter conditions.
    2. Use a Hybrid Search Approach: Implement both keyword and semantic search. Start with a keyword search to narrow down results based on filters, then apply semantic search on the filtered results to rank them by relevance.
    3. Fine-tuning Embeddings

    Make sure you're using a robust embedding model. Azure OpenAI provides models like text-embedding-ada-002, which are designed for semantic tasks. Experiment with different models to see which yields the best results for your specific data.

    1. Utilizing Azure AI Search Features

    Leverage Azure AI Search's capabilities:

    1. Skillsets: Create skillsets that can preprocess your data, such as extracting key phrases or summarizing content, which can enhance the embeddings.
    2. Custom Scoring Profiles: Adjust scoring profiles to prioritize certain columns or conditions based on your business logic.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.