Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform
Hello Parul Paul,
Welcome to Microsoft Q&A and Thank you for reaching out.
As asked difference between query and test_case_description is as below they serve different purposes in Agent Evaluation
query
- This is the actual user input that is sent to the Agent during evaluation.
- It is the prompt that triggers the Agent’s full execution flow (tools, RAG, reasoning, and response).
- During automatic evaluation, only the query field is executed.
test_case_description
- This is metadata for humans, not an executable input.
- It is intended to describe the scenario, intent, or context of the test case (for example, what behavior is being validated).
- It does not affect how the Agent runs or how the response is generated.
So, to summarize , query is the actual prompt that is sent to the agent during an evaluation run , while test_case_description is just metadata — a human-readable summary of the scenario or intent behind that query. It helps human reviewer know what is being tested,but it is not sent to the agent when an automatic evaluation is done.
When using Synthetic Data Generation, the dataset structure may look different because it is optimized for coverage and variety, not execution semantics. Internally, Foundry still maps the generated content into executable query inputs and descriptive metadata.
Automatic Agent Evaluation follows this flow:
- Each query is sent to the Agent exactly like a real user request.
- The Agent executes its full pipeline:
- Instruction following
- Tool invocation
- RAG (if configured)
- Final response generation
- The generated response is then evaluated using agent‑level metrics, such as:
- Goal completion
- Instruction adherence
- Response quality and coherence
Under the hood , Foundry’s evaluation service reads each row’s query, fires it off to the agent (including any tool calls you’ve configured), captures the full response (and tool trace) and then runs your selected Agent Evaluators (Intent Resolution, Task Adherence, Tool Call Accuracy, etc.) against that query+response pair.
Please note that The evaluation framework treats the Agent as a black box system and evaluates end‑to‑end behaviour instead of individual internal steps.
Although RAG is integrated into Azure AI Foundry,agent evaluation works at a higher‑level So,metrics like similarity, response completeness, and retrieval quality are designed for prompt‑response evaluation and standalone RAG pipelines
In an agent the retrieval is the only possible step.The Agent may reason, branch, or use tools beyond RAG.Because of this, Foundry does not expose internal retrieval signals during Agent Evaluation.
Please consider using
- RAG Evaluation when you want retrieval‑focused metrics
- Agent Evaluation when you want to validate overall task success
References
· Run evaluations from the Microsoft Foundry portal - Microsoft Foundry | Microsoft Learn
· Human Evaluation for Microsoft Foundry Agents - Microsoft Foundry | Microsoft Learn
· Agent Evaluators for Generative AI - Microsoft Foundry | Microsoft Learn
· Run evaluations from the Microsoft Foundry portal - Microsoft Foundry | Microsoft Learn
Please let me know if you have any questions.
Thank you!
Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.