@Chander, Ajay (Cognizant) Here are some strategies and tips that might help you refine your approach:
- Chunking and Embedding
- Since you're dealing with a large table and long text in the 'description' column, consider chunking your text data into smaller, semantically meaningful segments. This can help maintain context and improve the quality of embeddings. You can use techniques like:
- Fixed-size chunks (e.g., 200-300 words) with some overlap to preserve context.
- Variable-sized chunks based on natural language structures (like sentences or paragraphs) to ensure meaningful segments.
- Enhanced Query Processing
To improve relevance when applying filters:
- Combine Filters with Semantic Search: Use a two-step approach where you first retrieve relevant embeddings based on the query and then apply filters on the results. This can help maintain semantic relevance while still adhering to filter conditions.
- Use a Hybrid Search Approach: Implement both keyword and semantic search. Start with a keyword search to narrow down results based on filters, then apply semantic search on the filtered results to rank them by relevance.
- Fine-tuning Embeddings
Make sure you're using a robust embedding model. Azure OpenAI provides models like text-embedding-ada-002
, which are designed for semantic tasks. Experiment with different models to see which yields the best results for your specific data.
- Utilizing Azure AI Search Features
Leverage Azure AI Search's capabilities:
- Skillsets: Create skillsets that can preprocess your data, such as extracting key phrases or summarizing content, which can enhance the embeddings.
- Custom Scoring Profiles: Adjust scoring profiles to prioritize certain columns or conditions based on your business logic.