Hello @Ritesh Panditi , Thanks for using Microsoft Q&A Platform.
I would suggest Pre-built Layout model with the latest 2024-02-29-preview version would be the best option as per your use case. The layout model Markdown-formatted output is LLM-friendly and ensures smooth integration into your workflows. Markdown is widely used for enabling semantic chunking in RAG (Retrieval-Augmented Generation). You can refer to the documentation for more details.
The "prebuilt-layout
" model can detect figures
and extract information such as their spatial locations, text spans, and related text elements, caption.
Additionally, the "prebuilt-layout" model can identify two types of roles in a document layout: geometric roles (such as text, tables, figures, and selection marks) and logical roles (such as titles, headings, and footers).
The Azure AI Vision Image Analysis service can extract a wide variety of visual features from your images. For example, it can determine whether an image contains adult content, find specific brands or objects, or find human faces.
Please refer to these features and choose that fits best for your use case.
I hope this helps.
Regards,
Vasavi
-Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.