Share via

Foundry Latency and Tracing Inconsistencies - Guidance Needed

Deepak 0 Reputation points
2026-04-08T20:54:03.3933333+00:00

Hi Team,

We are currently using Azure AI Foundry (East US) with an Azure AI Search–backed Knowledge Base, integrated with a frontend application. We are observing a few inconsistent behaviors and would appreciate your guidance.

1. Latency / Performance Variability

We are seeing intermittent latency issues:

  • Response time varies significantly for the same queries  
  • Typical response time: ~3–6 seconds  
  • During impacted periods: ~25–40+ seconds  
  • Occurs mostly during afternoons and weekends sometimes. In some cases, the agent attempts to invoke the tool but returns an error such as “unable to connect”  
  • User load is minimal (≤ 5 users)  
  • No changes to configuration or data between runs  

In comparison, querying the underlying search index/knowledge base directly appears more consistent and faster.

Questions:

- Is this level of intermittent latency expected for Foundry agents in shared environments?  

- Are there internal factors (e.g., model capacity, Instructions orchestration overhead, or regional load in East US) that could cause this behavior?  

- What is the recommended way to monitor latency specifically at the Foundry agent level?

2. Tracing / Observability -

Foundry preview previously showed detailed traces (input/output, tool calls, token usage)  

  • For frontend/API calls, traces were not initially visible  
  • After reconnecting Application Insights, we can sometimes see frontend traces in Foundry, but this is inconsistent.
  • At the same time, preview traces have become unavailable  

Questions:

- What is the recommended and reliable approach for tracing both preview and API calls?  

- Are there any known issues with trace visibility or intermittent loss of traces?

Additional Context (FYI)

  • Region: East US  
  • Models:  - GPT-5.2 (Foundry agent)    
  • GPT-5 (Knowledge Base)  

We can share a few trace IDs if that helps. Thanks in advance for your guidance.

Thanks!!

Azure AI Search
Azure AI Search

An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.

0 comments No comments

1 answer

Sort by: Most helpful
  1. Q&A Assist
    2026-04-08T20:54:26.3933333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more
    1. Latency / performance variability

    Based on the provided information, the following points are supported:

    • Latency in multitenant AI services is expected to vary. Document Intelligence, which is also a multitenant AI service, explicitly notes that:
      • Latency varies by document size and content.
      • Variability is inherent in microservice-based, stateless, asynchronous services, even when documents/requests are similar.
      • Similar requests are expected to have comparable, but not identical, latency.
    • Foundry’s agent monitoring guidance treats latency above about 10 seconds as a signal that something is wrong or constrained:
      • Latency above 10 seconds may indicate model throttling, complex tool calls, or network issues.
    • For Document Intelligence in Foundry Tools, the recommended latency troubleshooting steps are:
      1. Check Azure region status for outages or incidents affecting the region and the specific AI service.
      2. Check file/request size and normalize latency (for documents, latency per page) to determine whether the workload itself is driving latency.
      3. Check storage latency (for example, Azure Blob Storage) if the workload depends on it.
      4. Establish a latency baseline for the specific scenario and compare current behavior against that baseline.

    These patterns apply conceptually to Foundry agents as well:

    • Intermittent spikes from ~3–6 seconds to 25–40+ seconds are not the desired steady state, but variability is expected in shared, multitenant environments.
    • Internal factors that can contribute include:
      • Regional load and capacity in the chosen region.
      • Orchestration overhead (agent reasoning, tool calls, retries).
      • Downstream dependencies (for example, storage or other tools) that add latency or fail intermittently.

    To monitor latency specifically at the Foundry agent level, the supported approach is:

    1. Use the Agent Monitoring Dashboard in Foundry (new):
      • Navigate to Build → select the agent → Monitor tab.
      • The dashboard exposes an explicit Latency metric for agent runs.
      • Latency above 10 seconds is called out as a threshold that warrants investigation.
    2. Use Application Insights metrics behind the scenes:
      • Monitoring data for agents is stored in the connected Application Insights resource.
      • Retention and billing follow the Application Insights configuration.
    3. For Document Intelligence–based tools, use Azure portal metrics:
      • On the resource Overview → Monitoring, review Request latency.
      • Under Monitoring → Metrics, add the Latency metric and adjust aggregation.
    4. For Blob or other storage dependencies, use storage latency metrics:
      • In the storage account → Monitoring → Insights, review E2E and Server latency.
    5. Tracing / observability

    The supported tracing model for Foundry agents and related components is:

    • Foundry provides an observability platform for monitoring and tracing AI agents. Tracing captures:
      • User inputs and agent outputs.
      • Tool usage (calls and results).
      • Token consumption.
      • Time signals such as duration and latency.
    • Key concepts:
      • Traces represent the full journey of a request.
      • Spans represent individual operations within a trace.
      • Attributes enrich spans with metadata.
    • For agents built with Microsoft Agent Framework:
      • When tracing is enabled for the Foundry project, agents automatically emit traces; no extra code or packages are required.
      • To verify tracing:
        1. Run the agent at least once.
        2. In the Foundry portal, go to Observability → Traces.
        3. Confirm that a new trace appears with spans for the agent’s operations.
      • Traces typically appear within 2–5 minutes after execution.
    • For agents built with LangChain or OpenAI Agents SDK:
      • Tracing requires explicit OpenTelemetry setup and an exporter to Application Insights.
      • For OpenAI Agents SDK, the documented pattern is:
        • Install opentelemetry-sdk, opentelemetry-instrumentation-openai-agents, and azure-monitor-opentelemetry-exporter.
        • Configure OpenTelemetry to export to Azure Monitor using APPLICATION_INSIGHTS_CONNECTION_STRING.
    • Viewing traces:
      • In Foundry: use the Tracing view in the portal to inspect traces and spans.
      • In Azure Monitor Application Insights: use End-to-end transaction details to investigate traces in more depth.

    Regarding preview vs. API calls and intermittent visibility:

    • Tracing is generally available for prompt agents; workflow, hosted, and custom agents are in preview.
    • Items marked preview are provided without an SLA and may have constrained capabilities, including trace behavior.
    • The documentation notes that:
      • Traces appear with a delay (2–5 minutes).
      • Preview features may not be suitable for production workloads.

    The reliable approach, based on the available guidance, is therefore:

    1. Ensure tracing is enabled at the Foundry project level and that the project is correctly connected to an Application Insights resource.
    2. For Microsoft Agent Framework–based agents, rely on the native integration (no extra code) and verify traces in Observability → Traces and in Application Insights.
    3. For other frameworks (for example, OpenAI Agents SDK, LangChain), configure OpenTelemetry with the Azure Monitor exporter and ensure APPLICATION_INSIGHTS_CONNECTION_STRING is set so traces are consistently exported to Application Insights.
    4. Use Application Insights as the authoritative backend for traces, especially when portal views are in preview or when there are UI-level inconsistencies.

    The provided context does not list specific known bugs about intermittent trace loss, but it does emphasize that preview features have no SLA and may have constrained or unsupported capabilities, which includes observability surfaces.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.