Select models for agents in Microsoft Discovery

Microsoft Discovery is built on Microsoft Foundry Agent Service. All models available in the Foundry model catalog are accessible for Discovery agents. During public preview, we recommend OpenAI GPT-5.x series models for the best experience with Discovery agents.

This article helps you choose the right model for your agents based on task complexity, output quality, cost, and response time. The guidance applies to both Microsoft Discovery and Discovery app, with additional flexibility available in Discovery app for third-party model endpoints.

Applicability

Offering	Model guidance	Additional options
Microsoft Discovery	All guidance in this article applies. Models are deployed as workspace-level managed resources.	Models from the Foundry model catalog
Discovery app	Same model selection principles apply.	Supports bring-your-own-model (BYOM) endpoints from third-party platforms

Prerequisites

An active Azure subscription.
A deployed Microsoft Discovery workspace with at least one project. For setup instructions, see Get started with Microsoft Discovery infrastructure.
A chat model deployment configured at the workspace level. For details, see Create a chat model deployment.

Understand available GPT-5.x models

The following table summarizes the OpenAI GPT-5.x models recommended for Discovery agents during public preview.

Model	Context window	Strength	Relative cost	Response time
GPT-5.2	400,000 tokens	General-purpose reasoning, tool use, structured output	Medium	Medium
GPT-5.4	1,050,000 tokens	Production-grade tasks with large context requirements	Medium-High	Medium
GPT-5.5	1,050,000 tokens	Latest reasoning capabilities with improved accuracy and instruction following	High	Medium
GPT-5.2-Pro	400,000 tokens	Deep reasoning, complex research, advanced code generation	High	Slow
GPT-5.4-Pro	1,050,000 tokens	Maximum reasoning depth with extended context	Highest	Slowest
GPT-5.2-Chat	400,000 tokens	Conversational interactions, Question and Answer, guidance	Medium-Low	Fast
GPT-5.4-Chat	1,050,000 tokens	Conversational interactions with larger context	Medium	Fast
GPT-5-mini	400,000 tokens	High-volume, cost-sensitive workloads	Low	Fast
GPT-5-nano	400,000 tokens	Ultra-low-cost, latency-sensitive workloads	Lowest	Fastest

Understand available Grok models

Grok models are available through Azure AI Foundry and provide an alternative for biology and life sciences research workflows. Compared to GPT models, Grok models have fewer content restrictions on biology-vertical queries, making them better suited for agents that reason over biomedical literature, molecular biology, genomics, and related domains where GPT safety filters might limit useful scientific output.

Model	Strength	Relative cost	Response time
Grok-4.20-reasoning	Chain-of-thought reasoning for complex biology and life sciences tasks	High	Slow
Grok-4.20-non-reasoning	Fast inference for biology-focused agents without chain-of-thought overhead	Medium	Fast

When to use Grok models

Biology and life sciences agents—Use Grok models when your agent handles queries about molecular structures, drug interactions, genetic pathways, or other biology-specific content where GPT models might refuse or over-filter responses.
Grok-4.20-reasoning—Choose the reasoning variant for agents that perform multi-step scientific reasoning, hypothesis generation, or complex analysis in biology domains. The chain-of-thought capability produces more thorough and explainable outputs.
Grok-4.20-non-reasoning—Choose the non-reasoning variant for high-throughput agents that need fast responses for biology-focused queries without the overhead of extended reasoning.

Note

Grok models are supported as bring-your-own-model (BYOM) deployments. Ensure the model is available in your target region and the corresponding quota is reserved. Check the Azure AI Foundry Model Catalog for current availability.

Match models to agent use cases

Different agent tasks have different requirements. Use the following guidance to select the right model for each agent in your project.

Prompt agents for research and analysis

Research agents handle literature review, data analysis, summarization, and scientific reasoning. These tasks require strong reasoning and accurate output.

Recommended models:

GPT-5.4 (default recommendation)—Provides strong reasoning with a large 1,050,000 token context window for most research tasks. It handles tool execution, structured output, and multi-step analysis well. Start here for most prompt agents.
GPT-5.5—The latest model in the GPT-5.x family with improved reasoning accuracy and instruction following. Choose GPT-5.5 for agents that require cutting-edge performance on complex scientific reasoning tasks. Available in limited regions (eastus2, northcentralus, southcentralus, westus3, polandcentral, swedencentral).
GPT-5.2—A cost-effective alternative when extended context isn't required. Offers reliable reasoning at a lower cost than GPT-5.4.
GPT-5.2-Pro / GPT-5.4-Pro—Use Pro variants for agents that perform deep scientific reasoning, complex hypothesis generation, or advanced code synthesis. Pro models allocate more compute per request and produce more thorough outputs. Expect higher cost and slower response times.

Prompt agents for planning and routing

Planning agents generate research plans, make routing decisions, or coordinate other agents. These tasks need consistent, deterministic behavior rather than creative reasoning.

Recommended models:

GPT-5.4—Handles planning and routing reliably. Set temperature to 0 for deterministic behavior.
GPT-5.2-Chat—A cost-effective alternative for lightweight planning tasks that don't require deep reasoning. Chat models respond faster and cost less per token.
GPT-5-mini—Suitable for simple routing decisions where the agent selects from a fixed set of options. Offers the best balance of cost and speed for straightforward classification tasks.

Prompt agents for tool execution

Tool-execution agents invoke Discovery tools, code interpreters, or Model Context Protocol (MCP) tools. They need reliable function calling and structured output generation.

Recommended models:

GPT-5.4—Offers consistent tool-calling behavior with a large context window. It reliably generates structured JSON for function arguments and parses tool responses accurately, while supporting many sequential tool calls.
GPT-5.2—A cost-effective alternative when tool operations don't involve large input payloads or extended context.

Prompt agents for user interaction

Interactive agents handle Question and Answer, onboarding, or exploratory conversations with researchers. They benefit from natural conversational tone and fast response times.

Recommended models:

GPT-5.2-Chat / GPT-5.4-Chat—Optimized for conversational interactions. Chat models provide natural, responsive dialogue, with lower latency and cost compared to their base counterparts.
GPT-5-mini—A strong choice for high-volume interactive scenarios if cost efficiency matters. Delivers good conversational quality at a fraction of the cost.

Multi-agent orchestration

When using the Discovery Engine for multi-agent orchestration, each prompt agent invoked by the engine uses its own model deployment. Apply the guidance in the previous sections to each prompt agent in your project.

For multi-agent scenarios, you can mix models across agents. For example, use GPT-5-mini for a routing agent, GPT-5.2 for a data-processing agent, and GPT-5.2-Pro for a synthesis agent. This approach optimizes cost without sacrificing output quality where it matters.

Evaluate tradeoffs between quality, cost, and speed

Use the following decision matrix to guide your model selection.

Priority	Recommended model	Tradeoff
Highest output quality	GPT-5.4-Pro	Slowest response, highest cost
Latest reasoning capabilities	GPT-5.5	Limited region availability
Best general-purpose balance	GPT-5.4	Good quality, large context, moderate cost
Large context requirements	GPT-5.4	Higher cost, supports up to 1,050,000 tokens
Biology and life sciences	Grok-4.20-reasoning	Fewer content restrictions on biology queries; BYOM deployment required
Fast biology inference	Grok-4.20-non-reasoning	Fewer content restrictions; no chain-of-thought reasoning
Fast conversational responses	GPT-5.2-Chat or GPT-5.4-Chat	Reduced reasoning depth
Cost optimization	GPT-5-mini	Good quality at lower cost
Lowest cost and latency	GPT-5-nano	Reduced output quality, best for simple tasks

Tips for optimizing cost

Start with GPT-5.4. It's the recommended default for Discovery agents. Move to a different model only when you have a specific reason.
Use smaller models for simple tasks. Routing, classification, and formatting tasks don't need Pro-level reasoning. GPT-5-mini or GPT-5-nano reduces cost significantly.
Reserve Pro models for high-value tasks. Deep research synthesis, complex hypothesis generation, and advanced code analysis justify the higher cost.
Mix models across agents. Assign different models to different agents based on each agent's task complexity.

Configure a model deployment for your agent

You configure model deployments at the workspace level. All agents in a project share these deployments.

Deploy a model as an Azure resource managed resource at the workspace level using Azure CLI, Bicep, or ARM templates. See Create a chat model deployment for detailed steps.
In Discovery Studio, create or edit a prompt agent.
Under Chat model, select the deployment name that corresponds to the model you want to use (for example, my-gpt-52-deployment).
Adjust Temperature and Top-P response controls based on your use case:
- For planning and routing agents, set Temperature to 0 for deterministic output.
- For research and analysis agents, use Temperature between 0.3 and 0.7 for balanced creativity and precision.
- For exploratory or brainstorming agents, set Temperature between 0.7 and 1.0.
Save the agent. Each save creates a new immutable version.

You can deploy multiple models in the same workspace and assign different deployments to different agents. Reference deployments by name, not resource ID.

Model selection in Discovery app

The model selection guidance in this article applies equally to Discovery app. The same principles for matching models to agent use cases, evaluating quality-cost-speed tradeoffs, and configuring response controls remain valid.

Bring your own model (BYOM)

Discovery app provides additional flexibility by allowing you to connect third-party model endpoints directly. In addition to models from the Foundry model catalog, Discovery app supports:

OpenAI endpoints—Connect directly to OpenAI API endpoints (for example, GPT-4o, GPT-5) using your own API keys
Anthropic endpoints—Connect to Anthropic Claude models directly
Other third-party platforms—Any model endpoint that follows standard API conventions

This flexibility enables you to:

Experiment with models not currently available in the Foundry catalog
Compare performance across different model providers
Use specialized models for domain-specific tasks

Important

When using third-party model endpoints in Discovery app, you are responsible for endpoint security, data handling, and compliance with your organization's policies. Microsoft Discovery model guidance remains the recommended baseline for production and team use.

Configuration in Discovery app

To configure a third-party model endpoint in Discovery app:

Open Discovery app settings.
Add a new model endpoint by providing the endpoint URL and authentication credentials.
Reference the configured endpoint when creating or editing your custom agent.

The same temperature, Top-P, and other response control parameters apply regardless of the model provider.

Feedback

Was this page helpful?

Last updated on 2026-06-04

Select models for agents in Microsoft Discovery

Applicability

Prerequisites

Understand available GPT-5.x models

Understand available Grok models

When to use Grok models

Match models to agent use cases

Prompt agents for research and analysis

Prompt agents for planning and routing

Prompt agents for tool execution

Prompt agents for user interaction

Multi-agent orchestration

Evaluate tradeoffs between quality, cost, and speed

Tips for optimizing cost

Configure a model deployment for your agent

Model selection in Discovery app

Bring your own model (BYOM)

Configuration in Discovery app

Related content

Feedback

Additional resources