Azure AI Search – Significant Latency Difference Across Client Machines (Same Region & Network)

Question

Azure AI Search – Significant Latency Difference Across Client Machines (Same Region & Network)

Sreshta Talluri 0

Description: We’re observing inconsistent query latency when calling our Azure AI Search service from different client machines, and we’d like help diagnosing the issue.

Details:

Service: Azure AI Search
Tier: Standard S1
Region: West US 2
Access Method: Azure REST API
Network Setup: Private network configuration is already in place (endpoints and connectivity verified)

Observed Behavior:

From one client machine (in California), search queries consistently complete in approximately 80–150 ms.
From another client machine (also in California, similar network context), identical queries are taking 500 ms or more.
Both clients are targeting the same search service, index, and API endpoints.

Expected Behavior: We expect query latency to be relatively consistent across client machines given similar geographic location and network setup.

What We’ve Checked So Far:

Service health appears normal.
Query payloads and API usage are identical across both clients.
No intentional throttling or scaling differences on the service side.
Private networking is configured and functioning.

Request for Support: Could you help us investigate potential causes for this discrepancy? Specifically:

Whether there are known latency differences due to routing, private endpoints, or DNS resolution.
Any diagnostics or logs on the Azure AI Search side that could indicate slower request handling for specific clients.
Recommendations for debugging client-side vs network vs service-level latency issues.

We can provide additional details (e.g., request/response logs, timing breakdowns, metrics, or trace IDs) if needed.

Karnam Venkata Rajeswari 3,240 Reputation points Microsoft External Staff Moderator

2026-05-29T19:57:45.7966667+00:00
Hello @Sreshta Talluri ,

Welcome to Microsoft Q&A .Thank you for reaching out to us.

Even when two clients use the same region, endpoint, index, and REST payload end-to-end latency can still differ because the request path includes multiple layers such as DNS, network transfer, TLS setup and service execution. The latency can occur at different points in the request and response chain that DNS configuration is critical when private endpoints are used.

As a first step please confirm whether the difference is consistent or intermittent as this distinction will help narrow the investigation quickly:

If one client is consistently slower, the more likely causes are DNS resolution differences, routing asymmetry, proxy/firewall inspection, or client configuration differences.

If both clients are occasionally slow, then transient network conditions or service-side variability should also remain in scope.

Validating DNS resolution and private endpoint behavior As the service is being accessed through a private endpoint, DNS is one of the most important checks.The DNS must be correctly configured so the connection resolves to the private endpoint IP and that public endpoint DNS behavior may need to be overridden for private connectivity to work correctly Please check whether both clients resolve the search endpoint to the same private IP address:
nslookup <search-service-name>.search.windows.net
Also review whether there are any differences in:

DNS forwarders or resolvers

private DNS zone linkage

cached DNS entries or TTL behavior

split-horizon DNS behavior

If the two machines resolve differently, the requests may be taking different network paths before ever reaching the service.

Comparing the network path from both clients If DNS matches, the next priority is to compare the actual route taken from each client. Azure guidance for performance analysis notes that latency can occur during network transfer as well as during service execution, so this step is important before assuming a backend issue Please run a route comparison from both clients to the resolved private IP:
tracert <resolved-private-ip>
When comparing results, look for:

different hop counts

unusual detours

added latency on one path

signs of proxy, VPN, or security inspection devices in the route

If one client shows a longer or more complex path, that would strongly suggest the latency is occurring at the network layer rather than within AI Search itself.

Breaking down client-side timing to isolate where the delay begins Once DNS and route are compared, the next step is to identify whether the delay is occurring during:

name resolution

TCP connection setup

TLS handshake

time waiting for the first byte of the response

A curl timing test is often the fastest way to separate these phases:
curl -w " DNS: %{time_namelookup} Connect: %{time_connect} TLS: %{time_appconnect} TTFB: %{time_starttransfer} Total: %{time_total} " -o /dev/null -s https://<endpoint>
If DNS, connect, or TLS times differ materially between the two machines, the issue is likely client-side or network-related. If those stages are similar but total time still diverges, service-side telemetry should be checked next.

Validating Azure AI Search telemetry to compare service-side execution It is recommended to use Azure Monitor metrics and diagnostic logging to understand query performance and determine whether the issue is occurring within the service or elsewhere in the request path. Diagnostic logging is specifically described as essential for monitoring indexing and query operations, and query monitoring documentation highlights latency, QPS and throttling as the core signals to review. The main telemetry to review is:

Search latency

Search queries per second (QPS)

Throttled queries

any indexing activity occurring during the same time window

If service-side latency remains stable while one client still reports much higher end-to-end time, the variance is likely outside the service.

Comparing client environment differences If DNS, routing and service telemetry do not immediately explain the difference, the remaining likely causes are environment-specific client factors such as:

explicit or transparent proxy configuration

endpoint protection or TLS inspection

VPN or split-tunnel behavior

OS-level networking behavior

differences in connection reuse or pooling

These factors can add measurable latency even when the service and query are identical. This is especially relevant in private networking scenarios where local enterprise controls may affect only one machine or subnet path.

The following references might be helpful , please check them out

Analyze Performance - Azure AI Search | Microsoft Learn

Monitor Queries - Azure AI Search | Microsoft Learn

Configure Logging - Azure AI Search | Microsoft Learn

Monitor Your Search Service - Azure AI Search | Microsoft Learn

Monitoring Data Reference - Azure AI Search | Microsoft Learn

Create a Private Endpoint for a Secure Connection - Azure AI Search | Microsoft Learn

Azure Private Endpoint private DNS zone values | Microsoft Learn

Azure Private Endpoint DNS Integration Scenarios | Microsoft Learn

Please let us know if the response was helpful

Thank you
Karnam Venkata Rajeswari 3,240 Reputation points Microsoft External Staff Moderator

2026-06-01T08:15:30.34+00:00

Hello @Sreshta Talluri ,

Following up to see if you had any chance to review the above response.

Thank you

1 answer

Your answer

Karnam Venkata Rajeswari 3,240 Reputation points Microsoft External Staff Moderator

2026-06-01T08:15:30.34+00:00

Hello @Sreshta Talluri ,

Following up to see if you had any chance to review the above response.

Thank you

Answer 1

Azure AI Search query latency consists of two main components:

Time spent in the search service processing the query.
Network round-trip time (RTT) between the client and the service.

The context provides guidance for isolating and analyzing both.

To distinguish client/network vs. service latency:

Use the elapsed-time header from Azure AI Search responses
- Execute the same REST query from both client machines.
- Inspect the response headers and capture the elapsed-time value (in milliseconds) returned by the service.
- Also capture the total round-trip duration reported by the REST client (for example, the duration shown by the tool or HTTP library).
- The difference between the round-trip duration and elapsed-time is the network + client overhead.
- Example from the documentation:
  - elapsed-time (service processing): 21 ms
  - Round-trip duration: 125 ms
  - Network + client overhead: 104 ms
- Apply the same method on both machines:
  - If elapsed-time is similar (for example, ~20–40 ms) but total duration differs (80–150 ms vs. 500+ ms), the discrepancy is due to network/client-side factors.
  - If elapsed-time itself is significantly higher for one client, that indicates the service is taking longer for those requests and further investigation of service-side load and background processing is needed.
Check for query throttling or high query volume
- Use Azure Monitor logs and Kusto queries to examine query rates and average duration:
  - Look at queries per minute (QPM) and average duration (AvgDurationMS) to see if there are periods where the service is under higher load, which can increase latency.
- If the slower client is issuing queries during a period of higher QPM or when other workloads are active, this can explain higher latency.
Consider indexing and background processing impact
- Indexing operations share resources with queries. High indexing activity can increase query latency.
- Use the provided Kusto queries to visualize:
  - Average query latency over time.
  - Queries per minute.
  - Indexing operations per minute.
- Correlate the time windows when the slower client runs queries with indexing spikes or shard merge operations:
  - If indexing or shard merges are active, short-term latency spikes are expected.
  - The documentation notes that indexing and shard merges can cause temporary latency increases while the service completes resource-intensive background work.
Network path and region considerations
- Even within the same geographic area (California), network paths can differ between client machines (different ISPs, routes, or peering), affecting RTT.
- The Azure network latency statistics show typical inter-region RTT ranges (for example, US-to-US regional pairs often have 26–50 ms RTT). While both clients are in California and targeting West US 2, their effective RTT can still differ depending on routing.
- If one client’s RTT to West US 2 is higher due to its network path, this will show up as a larger difference between round-trip duration and elapsed-time.
Recommended diagnostic steps
1. On both clients, run the same REST query multiple times and record:
  - elapsed-time header.
  - Total round-trip duration.
  - Time of day.
2. In Azure Monitor:
  - Plot average query latency and QPM for the same time windows.
  - Plot indexing operations per minute.
  - Look for correlation between higher latency and indexing or high QPM.
3. If elapsed-time is consistently low and similar on both clients but total duration differs:
  - Focus on client/network:
    - Compare DNS resolution paths and ensure both clients resolve the same endpoint and IP.
    - Check for local network constraints (VPN, proxies, firewalls) on the slower client.
4. If elapsed-time is higher for the slower client’s requests:
  - Correlate with indexing and background processing (shard merges) as described above.
  - Consider scheduling heavy indexing jobs during off-peak hours or scaling the search service if sustained load is high.

These steps allow separating service-side processing time from client/network latency and identifying whether the discrepancy is due to network routing, background indexing/merge operations, or overall query volume.

References:

Share via

Azure AI Search – Significant Latency Difference Across Client Machines (Same Region & Network)

1 answer

Your answer