Azure AI Search document preprocessing
To ensure optimal input for indexing and vectorizing, it's important to prepare your data properly. Azure OpenAI On Your Data supports various file types, including HTML, PDF, and Markdown files. If you're converting data from an unsupported format into a supported format, you should ensure that the conversion doesn't lead to significant data loss or add unexpected noise to your data. Additionally, if your files have special formatting, such as tables and columns, or bullet points, you should prepare your data with the data preparation script available on GitHub. For documents and datasets with long text, you should use the available data preparation script. The script chunks data so that the model's responses are more accurate. This script also supports scanned PDF files and images. Thank you in advance!