Azure Search SDK Connection Reset Error - ServiceResponseError

Rishav Arora 25 Reputation points
2025-07-02T12:23:46.1333333+00:00

Issue Description :-

We're experiencing intermittent ServiceResponseError with "Connection reset by peer" errors when using the Azure Search SDK in a Python application. The error occurs during result iteration in the paging functionality.

Error Details

azure.core.exceptions.ServiceResponseError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

Stack Trace

The error occurs in the Azure Search SDK's paging mechanism:

File "/usr/local/lib/python3.9/site-packages/azure/search/documents/_paging.py", line 54, in next return next(self._page_iterator) File "/usr/local/lib/python3.9/site-packages/azure/core/paging.py", line 75, in next self._response = self._get_next(self.continuation_token) File "/usr/local/lib/python3.9/site-packages/azure/search/documents/_paging.py", line 125, in _get_next_cb return self._client.documents.search_post(search_request=self._initial_query.request, **self._kwargs) [... continues through Azure SDK pipeline ...] File "/usr/local/lib/python3.9/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 409, in send raise error azure.core.exceptions.ServiceResponseError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

Environment

  • Python Version: 3.9
  • Azure Search SDK: azure-search-documents (version 11.5.2)
  • Deployment: AKS
  • Operating System: Linux (containerized)

Code Pattern

def perform_search(search_query, filters):
    # Initialize Azure Search client (singleton pattern)
    search_client = get_search_client()

    # Build filter string from dictionary
    filter_text = build_filter_string(filters)

    # Perform search with parameters
    results = search_client.search(
        search_text=search_query,
        search_fields=['searchable_field'],
        query_type='full',
        filter=filter_text,
        top=100
    )

    # Error occurs during result iteration
    search_results_list = []
    for result in results:  # <-- Error occurs here
        search_results_list.append({k: v for k, v in result.items() if v is not None})

    return search_results_list


def get_search_client():
    global SEARCH_CLIENT
    if SEARCH_CLIENT:
        return SEARCH_CLIENT

    SEARCH_CLIENT = SearchClient(
        endpoint=SEARCH_ENDPOINT,
        index_name=INDEX_NAME,
        credential=AzureKeyCredential(API_KEY)
    )
    return SEARCH_CLIENT

Request Details

  • Search Query: Simple text search with wildcard (e.g., "GLUCONORM*")
  • Filter: Single equality filter
  • Result Limit: 100 items
  • Query Type: 'full'

Questions

  1. Is this a known issue with the Azure Search SDK's paging mechanism?
  2. Are there recommended retry strategies for handling connection resets during result iteration?
  3. Should we implement connection pooling or modify the client initialization pattern?
  4. Are there specific timeout settings that might help prevent these connection resets?

Additional Context

  • The error is intermittent and doesn't occur on every search request
  • We're using a singleton pattern for the search client
  • The application runs in a AKS(Azure kubernetes service) environment
  • Network connectivity to Azure services is generally stable

Any guidance on best practices for handling this type of connection error would be greatly appreciated.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,360 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Bhargavi Naragani 6,535 Reputation points Microsoft External Staff Moderator
    2025-07-03T05:14:07.1166667+00:00

    Hi Rishav Arora,

    It seems like you’re encountering intermittent ConnectionResetError (104, 'Connection reset by peer') errors when using the Azure Search SDK (azure-search-documents==11.5.2) during paging result iteration in a Python app deployed on AKS. This is a known type of issue, especially when the connection between the client and Azure Search is interrupted due to factors like network instability, idle timeouts, or socket reuse patterns in containerized environments like AKS.

    1. Use RetryPolicy from azure-core to automatically retry transient errors like connection resets:
    from azure.core.pipeline.policies import RetryPolicy
    from azure.core.pipeline.transport import RequestsTransport
    from azure.search.documents import SearchClient
    from azure.core.credentials import AzureKeyCredential
    retry_policy = RetryPolicy(
        retry_total=5,
        retry_connect=2,
        retry_read=2,
        retry_status=2,
        retry_backoff_factor=0.8,
        retry_backoff_max=30
    )
    transport = RequestsTransport(retry_policy=retry_policy)
    search_client = SearchClient(
        endpoint=SEARCH_ENDPOINT,
        index_name=INDEX_NAME,
        credential=AzureKeyCredential(API_KEY),
        transport=transport
    )
    

    https://learn.microsoft.com/en-us/python/api/azure-core/azure.core.pipeline.policies.retrypolicy?view=azure-python

    1. Socket timeouts can help prevent hanging connections. Update RequestsTransport:
    transport = RequestsTransport(connection_timeout=10, read_timeout=30)
    

    https://azuresdkdocs.z19.web.core.windows.net/python/azure-core/latest/azure.core.pipeline.policies.html

    1. Avoid Long-Lived Singleton Clients in AKS. In container environments like AKS, TCP connections may become stale when pods restart. You can:
    • Re-create the SearchClient per request, or
    • Use a TTL-based cache with periodic re-instantiation (e.g., every 5–10 minutes)

    This avoids reuse of broken connections.

    1. Handle Errors During Iteration - Wrap your paging loop to catch and retry on failure:
    from azure.core.exceptions import ServiceResponseError
    import time
    for attempt in range(3):
        try:
            for result in results:
                process(result)
            break
        except ServiceResponseError as e:
            log.error(f"Search iteration failed: {e}")
            time.sleep(2 ** attempt)
    

    This captures failures during streaming and retries cleanly.

    1. If numerous outbound connections happen (especially on load balancer), you might hit SNAT port limits. https://github.com/jometzg/diagnosing-aks-port-exhaustion?

    Simplest: deploy NAT Gateway on your AKS subnet.
    https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/connectivity/snat-port-exhaustion?tabs=for-a-linux-pod

    Hope this helps, if you have any further concerns or queries, please feel free to reach out to us.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.