An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
- Latency / performance variability
Based on the provided information, the following points are supported:
- Latency in multitenant AI services is expected to vary. Document Intelligence, which is also a multitenant AI service, explicitly notes that:
- Latency varies by document size and content.
- Variability is inherent in microservice-based, stateless, asynchronous services, even when documents/requests are similar.
- Similar requests are expected to have comparable, but not identical, latency.
- Foundry’s agent monitoring guidance treats latency above about 10 seconds as a signal that something is wrong or constrained:
- Latency above 10 seconds may indicate model throttling, complex tool calls, or network issues.
- For Document Intelligence in Foundry Tools, the recommended latency troubleshooting steps are:
- Check Azure region status for outages or incidents affecting the region and the specific AI service.
- Check file/request size and normalize latency (for documents, latency per page) to determine whether the workload itself is driving latency.
- Check storage latency (for example, Azure Blob Storage) if the workload depends on it.
- Establish a latency baseline for the specific scenario and compare current behavior against that baseline.
These patterns apply conceptually to Foundry agents as well:
- Intermittent spikes from ~3–6 seconds to 25–40+ seconds are not the desired steady state, but variability is expected in shared, multitenant environments.
- Internal factors that can contribute include:
- Regional load and capacity in the chosen region.
- Orchestration overhead (agent reasoning, tool calls, retries).
- Downstream dependencies (for example, storage or other tools) that add latency or fail intermittently.
To monitor latency specifically at the Foundry agent level, the supported approach is:
- Use the Agent Monitoring Dashboard in Foundry (new):
- Navigate to Build → select the agent → Monitor tab.
- The dashboard exposes an explicit Latency metric for agent runs.
- Latency above 10 seconds is called out as a threshold that warrants investigation.
- Use Application Insights metrics behind the scenes:
- Monitoring data for agents is stored in the connected Application Insights resource.
- Retention and billing follow the Application Insights configuration.
- For Document Intelligence–based tools, use Azure portal metrics:
- On the resource Overview → Monitoring, review Request latency.
- Under Monitoring → Metrics, add the Latency metric and adjust aggregation.
- For Blob or other storage dependencies, use storage latency metrics:
- In the storage account → Monitoring → Insights, review E2E and Server latency.
- Tracing / observability
The supported tracing model for Foundry agents and related components is:
- Foundry provides an observability platform for monitoring and tracing AI agents. Tracing captures:
- User inputs and agent outputs.
- Tool usage (calls and results).
- Token consumption.
- Time signals such as duration and latency.
- Key concepts:
- Traces represent the full journey of a request.
- Spans represent individual operations within a trace.
- Attributes enrich spans with metadata.
- For agents built with Microsoft Agent Framework:
- When tracing is enabled for the Foundry project, agents automatically emit traces; no extra code or packages are required.
- To verify tracing:
- Run the agent at least once.
- In the Foundry portal, go to Observability → Traces.
- Confirm that a new trace appears with spans for the agent’s operations.
- Traces typically appear within 2–5 minutes after execution.
- For agents built with LangChain or OpenAI Agents SDK:
- Tracing requires explicit OpenTelemetry setup and an exporter to Application Insights.
- For OpenAI Agents SDK, the documented pattern is:
- Install
opentelemetry-sdk,opentelemetry-instrumentation-openai-agents, andazure-monitor-opentelemetry-exporter. - Configure OpenTelemetry to export to Azure Monitor using
APPLICATION_INSIGHTS_CONNECTION_STRING.
- Install
- Viewing traces:
- In Foundry: use the Tracing view in the portal to inspect traces and spans.
- In Azure Monitor Application Insights: use End-to-end transaction details to investigate traces in more depth.
Regarding preview vs. API calls and intermittent visibility:
- Tracing is generally available for prompt agents; workflow, hosted, and custom agents are in preview.
- Items marked preview are provided without an SLA and may have constrained capabilities, including trace behavior.
- The documentation notes that:
- Traces appear with a delay (2–5 minutes).
- Preview features may not be suitable for production workloads.
The reliable approach, based on the available guidance, is therefore:
- Ensure tracing is enabled at the Foundry project level and that the project is correctly connected to an Application Insights resource.
- For Microsoft Agent Framework–based agents, rely on the native integration (no extra code) and verify traces in Observability → Traces and in Application Insights.
- For other frameworks (for example, OpenAI Agents SDK, LangChain), configure OpenTelemetry with the Azure Monitor exporter and ensure
APPLICATION_INSIGHTS_CONNECTION_STRINGis set so traces are consistently exported to Application Insights. - Use Application Insights as the authoritative backend for traces, especially when portal views are in preview or when there are UI-level inconsistencies.
The provided context does not list specific known bugs about intermittent trace loss, but it does emphasize that preview features have no SLA and may have constrained or unsupported capabilities, which includes observability surfaces.
References:
- Troubleshooting latency issues in Azure Document Intelligence in Foundry Tools
- Check monitoring metrics for your resource
- Check Azure region status
- Monitor agents with the Agent Monitoring Dashboard
- Agent tracing overview
- Trace and observe AI agents in Microsoft Foundry (preview) (classic)
- Configure tracing for AI agent frameworks (preview)