An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
Azure AI Search query latency consists of two main components:
- Time spent in the search service processing the query.
- Network round-trip time (RTT) between the client and the service.
The context provides guidance for isolating and analyzing both.
To distinguish client/network vs. service latency:
- Use the
elapsed-timeheader from Azure AI Search responses- Execute the same REST query from both client machines.
- Inspect the response headers and capture the
elapsed-timevalue (in milliseconds) returned by the service. - Also capture the total round-trip duration reported by the REST client (for example, the duration shown by the tool or HTTP library).
- The difference between the round-trip duration and
elapsed-timeis the network + client overhead. - Example from the documentation:
-
elapsed-time(service processing): 21 ms - Round-trip duration: 125 ms
- Network + client overhead: 104 ms
-
- Apply the same method on both machines:
- If
elapsed-timeis similar (for example, ~20–40 ms) but total duration differs (80–150 ms vs. 500+ ms), the discrepancy is due to network/client-side factors. - If
elapsed-timeitself is significantly higher for one client, that indicates the service is taking longer for those requests and further investigation of service-side load and background processing is needed.
- If
- Check for query throttling or high query volume
- Use Azure Monitor logs and Kusto queries to examine query rates and average duration:
- Look at queries per minute (QPM) and average duration (AvgDurationMS) to see if there are periods where the service is under higher load, which can increase latency.
- If the slower client is issuing queries during a period of higher QPM or when other workloads are active, this can explain higher latency.
- Use Azure Monitor logs and Kusto queries to examine query rates and average duration:
- Consider indexing and background processing impact
- Indexing operations share resources with queries. High indexing activity can increase query latency.
- Use the provided Kusto queries to visualize:
- Average query latency over time.
- Queries per minute.
- Indexing operations per minute.
- Correlate the time windows when the slower client runs queries with indexing spikes or shard merge operations:
- If indexing or shard merges are active, short-term latency spikes are expected.
- The documentation notes that indexing and shard merges can cause temporary latency increases while the service completes resource-intensive background work.
- Network path and region considerations
- Even within the same geographic area (California), network paths can differ between client machines (different ISPs, routes, or peering), affecting RTT.
- The Azure network latency statistics show typical inter-region RTT ranges (for example, US-to-US regional pairs often have 26–50 ms RTT). While both clients are in California and targeting West US 2, their effective RTT can still differ depending on routing.
- If one client’s RTT to West US 2 is higher due to its network path, this will show up as a larger difference between round-trip duration and
elapsed-time.
- Recommended diagnostic steps
- On both clients, run the same REST query multiple times and record:
-
elapsed-timeheader. - Total round-trip duration.
- Time of day.
-
- In Azure Monitor:
- Plot average query latency and QPM for the same time windows.
- Plot indexing operations per minute.
- Look for correlation between higher latency and indexing or high QPM.
- If
elapsed-timeis consistently low and similar on both clients but total duration differs:- Focus on client/network:
- Compare DNS resolution paths and ensure both clients resolve the same endpoint and IP.
- Check for local network constraints (VPN, proxies, firewalls) on the slower client.
- Focus on client/network:
- If
elapsed-timeis higher for the slower client’s requests:- Correlate with indexing and background processing (shard merges) as described above.
- Consider scheduling heavy indexing jobs during off-peak hours or scaling the search service if sustained load is high.
- On both clients, run the same REST query multiple times and record:
These steps allow separating service-side processing time from client/network latency and identifying whether the discrepancy is due to network routing, background indexing/merge operations, or overall query volume.
References: