Share via

Azure AI Search – Significant Latency Difference Across Client Machines (Same Region & Network)

Sreshta Talluri 0 Reputation points
2026-05-29T19:47:06.9533333+00:00

Description: We’re observing inconsistent query latency when calling our Azure AI Search service from different client machines, and we’d like help diagnosing the issue.

Details:

  • Service: Azure AI Search
  • Tier: Standard S1
  • Region: West US 2
  • Access Method: Azure REST API
  • Network Setup: Private network configuration is already in place (endpoints and connectivity verified)

Observed Behavior:

  • From one client machine (in California), search queries consistently complete in approximately 80–150 ms.
  • From another client machine (also in California, similar network context), identical queries are taking 500 ms or more.
  • Both clients are targeting the same search service, index, and API endpoints.

Expected Behavior: We expect query latency to be relatively consistent across client machines given similar geographic location and network setup.

What We’ve Checked So Far:

  • Service health appears normal.
  • Query payloads and API usage are identical across both clients.
  • No intentional throttling or scaling differences on the service side.
  • Private networking is configured and functioning.

Request for Support: Could you help us investigate potential causes for this discrepancy? Specifically:

  • Whether there are known latency differences due to routing, private endpoints, or DNS resolution.
  • Any diagnostics or logs on the Azure AI Search side that could indicate slower request handling for specific clients.
  • Recommendations for debugging client-side vs network vs service-level latency issues.

We can provide additional details (e.g., request/response logs, timing breakdowns, metrics, or trace IDs) if needed.

Azure AI Search
Azure AI Search

An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.


1 answer

Sort by: Most helpful
  1. AI answer

    2026-05-29T19:47:34.2433333+00:00

    Azure AI Search query latency consists of two main components:

    1. Time spent in the search service processing the query.
    2. Network round-trip time (RTT) between the client and the service.

    The context provides guidance for isolating and analyzing both.

    To distinguish client/network vs. service latency:

    1. Use the elapsed-time header from Azure AI Search responses
      • Execute the same REST query from both client machines.
      • Inspect the response headers and capture the elapsed-time value (in milliseconds) returned by the service.
      • Also capture the total round-trip duration reported by the REST client (for example, the duration shown by the tool or HTTP library).
      • The difference between the round-trip duration and elapsed-time is the network + client overhead.
      • Example from the documentation:
        • elapsed-time (service processing): 21 ms
        • Round-trip duration: 125 ms
        • Network + client overhead: 104 ms
      • Apply the same method on both machines:
        • If elapsed-time is similar (for example, ~20–40 ms) but total duration differs (80–150 ms vs. 500+ ms), the discrepancy is due to network/client-side factors.
        • If elapsed-time itself is significantly higher for one client, that indicates the service is taking longer for those requests and further investigation of service-side load and background processing is needed.
    2. Check for query throttling or high query volume
      • Use Azure Monitor logs and Kusto queries to examine query rates and average duration:
        • Look at queries per minute (QPM) and average duration (AvgDurationMS) to see if there are periods where the service is under higher load, which can increase latency.
      • If the slower client is issuing queries during a period of higher QPM or when other workloads are active, this can explain higher latency.
    3. Consider indexing and background processing impact
      • Indexing operations share resources with queries. High indexing activity can increase query latency.
      • Use the provided Kusto queries to visualize:
        • Average query latency over time.
        • Queries per minute.
        • Indexing operations per minute.
      • Correlate the time windows when the slower client runs queries with indexing spikes or shard merge operations:
        • If indexing or shard merges are active, short-term latency spikes are expected.
        • The documentation notes that indexing and shard merges can cause temporary latency increases while the service completes resource-intensive background work.
    4. Network path and region considerations
      • Even within the same geographic area (California), network paths can differ between client machines (different ISPs, routes, or peering), affecting RTT.
      • The Azure network latency statistics show typical inter-region RTT ranges (for example, US-to-US regional pairs often have 26–50 ms RTT). While both clients are in California and targeting West US 2, their effective RTT can still differ depending on routing.
      • If one client’s RTT to West US 2 is higher due to its network path, this will show up as a larger difference between round-trip duration and elapsed-time.
    5. Recommended diagnostic steps
      1. On both clients, run the same REST query multiple times and record:
        • elapsed-time header.
        • Total round-trip duration.
        • Time of day.
      2. In Azure Monitor:
        • Plot average query latency and QPM for the same time windows.
        • Plot indexing operations per minute.
        • Look for correlation between higher latency and indexing or high QPM.
      3. If elapsed-time is consistently low and similar on both clients but total duration differs:
        • Focus on client/network:
          • Compare DNS resolution paths and ensure both clients resolve the same endpoint and IP.
          • Check for local network constraints (VPN, proxies, firewalls) on the slower client.
      4. If elapsed-time is higher for the slower client’s requests:
        • Correlate with indexing and background processing (shard merges) as described above.
        • Consider scheduling heavy indexing jobs during off-peak hours or scaling the search service if sustained load is high.

    These steps allow separating service-side processing time from client/network latency and identifying whether the discrepancy is due to network routing, background indexing/merge operations, or overall query volume.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.