Share via

AI foundry

Tomasz Świercz 20 Reputation points
2025-05-10T14:33:25.3566667+00:00

Hello,

I would like to create a project in the AI ​​foundry project that will allow reading data from pdf files. In the project, using the Azure Search AI service, pdf files would be vectorized. Then, in the Prompt flow, I would like to connect the o3 mini or o4 mini model to the vectorized pdf files. The Prompt flow would connect to Excel using the API. In AI foundry, I noticed that in the test environment there is no possibility to test the o3 mini and o4 mini models on your own vectorized files, unlike other models, such as GPT 4.1. I have a question about using the AI ​​model in the Prompt flow. Can only some models be used there to work on your own vectorized files, except for e.g. o3 mini and o4 mini? Additionally, I would like to learn some information about the Azure AI Search service. I noticed that it is not possible to vectorize pdf files larger than 16 MB there. So is it possible to vectorize multiple pdf files at once, none of which will weigh more than 16 MB, but together they will weigh up to about 1 GB. Is this also limited? Or is there also a limit to the number of pdf files that can be vectorized at once?

Best regards, Tomasz Świercz

Foundry Tools
Foundry Tools

Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform

0 comments No comments

Answer accepted by question author

Marcin Policht 90,805 Reputation points MVP Volunteer Moderator
2025-05-10T16:08:17.25+00:00

You're asking a well-structured set of questions that touch on Azure AI Foundry Prompt Flow integration with models, Azure AI Search limitations, and vectorization of PDF documents.

1. Using o3 mini or o4 mini in Prompt Flow with your own vectorized files

You're correct — not all models can currently be used with your own vectorized files in Prompt Flow, particularly in Azure AI Studio (AI Foundry).

  • Supported Models: As of now, GPT-4, GPT-3.5, and some OSS models (like Mistral or Phi) are officially supported in retrieval-augmented generation (RAG) scenarios with your own data.
  • Limitations for o3 mini / o4 mini:
    • These models are not yet fully integrated into the custom data (RAG) experience in Prompt Flow or AI Studio.
    • They may work in custom deployments or API scenarios, but the GUI experience (e.g., Test Chat) currently doesn't support o3/o4 mini with your indexed data.
  • Workaround: You could export the vectorized data via API and call o3/o4 mini manually with embedded context, but that requires custom code — not the drag-and-drop Prompt Flow.

Only selected models like GPT-4.1 are currently usable with your own vectorized files in Prompt Flow’s native UI, unless you manually build the RAG pipeline using APIs.

2. Azure AI Search – PDF size and vectorization limits

File size limits

  • Each file to be vectorized by Azure AI Search via Azure AI Document Intelligence or Indexer pipelines must be ≤ 16 MB.
  • This is a hard limit due to service constraints on Document Intelligence and AI Search.

Total volume (e.g., 1 GB of files)

  • Yes, you can vectorize many PDF files individually that are each under 16 MB.
  • There is no global 1 GB limit — you’re only constrained by:
    • Index storage quotas (scalable per pricing tier).
    • Concurrency and throttling (e.g., how many files you process at once via an indexer or pipeline).
  • If you batch-upload 1 GB worth of smaller PDFs (e.g., 100 PDFs at 10 MB each), that is allowed — they’ll just be indexed separately.

Number of files

  • There's no strict file number limit either, but performance will vary depending on:
    • Your Azure AI Search tier (Basic, Standard, etc.).
    • Your indexer settings and AI enrichment concurrency.
    • API rate limits if using REST/SDK-based ingestion.

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin

Was this answer helpful?


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.