Help with Semantic Search on Large Table Using Azure OpenAI and Search

Question

Help with Semantic Search on Large Table Using Azure OpenAI and Search

Chander, Ajay (Cognizant) 0

I'm working on a POC for a semantic search project and would appreciate some advice on how to approach it better.

Objective: I need to retrieve relevant results based on user queries from a wide table with over 50 columns, which includes more than 100,000 rows. My focus is on a key column, ‘description,’ containing long text, while the other columns are mostly integers, booleans, tiny integers, and dates.

Challenge: The user could ask questions that involve the semantic meaning of the content in the table (not just exact matches). Additionally, queries might involve 1-3 filter conditions across different columns, so the output should still yield the most relevant rows.

What I've Tried:

I'm using Azure OpenAI and Search resources.

I attempted concatenating about 10 important columns, embedding this combined text, and using it for vector search.

The issue: It works somewhat when the query relates closely to the ‘description’ column, but when filters apply to other columns, the results aren’t as relevant as I’d hoped.

Question: Has anyone tackled a similar use case? Any tips on refining this approach or other tools I should consider? Any advice is appreciated, especially since I’m new to this type of problem. Thanks!

1 answer

Your answer

Answer 1

@Chander, Ajay (Cognizant) Here are some strategies and tips that might help you refine your approach:

Chunking and Embedding
1. Since you're dealing with a large table and long text in the 'description' column, consider chunking your text data into smaller, semantically meaningful segments. This can help maintain context and improve the quality of embeddings. You can use techniques like:
2. Fixed-size chunks (e.g., 200-300 words) with some overlap to preserve context.
3. Variable-sized chunks based on natural language structures (like sentences or paragraphs) to ensure meaningful segments.
Enhanced Query Processing

To improve relevance when applying filters:

Combine Filters with Semantic Search: Use a two-step approach where you first retrieve relevant embeddings based on the query and then apply filters on the results. This can help maintain semantic relevance while still adhering to filter conditions.
Use a Hybrid Search Approach: Implement both keyword and semantic search. Start with a keyword search to narrow down results based on filters, then apply semantic search on the filtered results to rank them by relevance.
Fine-tuning Embeddings

Make sure you're using a robust embedding model. Azure OpenAI provides models like text-embedding-ada-002, which are designed for semantic tasks. Experiment with different models to see which yields the best results for your specific data.

Utilizing Azure AI Search Features

Leverage Azure AI Search's capabilities:

Skillsets: Create skillsets that can preprocess your data, such as extracting key phrases or summarizing content, which can enhance the embeddings.
Custom Scoring Profiles: Adjust scoring profiles to prioritize certain columns or conditions based on your business logic.

Share via

Help with Semantic Search on Large Table Using Azure OpenAI and Search

1 answer

Your answer