Azure AI Search document preprocessing

Dezső Kántor 0 Reputation points
2024-09-03T15:52:00.17+00:00

To ensure optimal input for indexing and vectorizing, it's important to prepare your data properly. Azure OpenAI On Your Data supports various file types, including HTML, PDF, and Markdown files. If you're converting data from an unsupported format into a supported format, you should ensure that the conversion doesn't lead to significant data loss or add unexpected noise to your data. Additionally, if your files have special formatting, such as tables and columns, or bullet points, you should prepare your data with the data preparation script available on GitHub. For documents and datasets with long text, you should use the available data preparation script. The script chunks data so that the model's responses are more accurate. This script also supports scanned PDF files and images. Thank you in advance!

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
939 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.