Compartir a través de

Azure Backup for SQL: Persistence of "Not Reachable" status in high-density instances (>1,500 DBs)

Fabián Cordero Granados 20 Puntos de reputación
2026-03-27T15:47:07.1333333+00:00

I am managing an Azure VM running SQL Server with a high density of databases (approx. 1,500–1,600 DBs). We are using Azure Backup for SQL Server in Azure VMs via a Recovery Services Vault (RSV).

Issue: The SQL instance consistently shows a status of "Not Reachable" or "Incomplete" in the Azure Portal, even though the SQL Engine is healthy and the extension services are running.

Steps already taken:

  • Restarted the extension services: IaaSWorkloadCoordinatorService and IaasWLPluginSvc.

Triggered "Rediscover DBs" from the Recovery Services Vault multiple times.

Verified that there are no Antivirus locks on C:\Program Files\Azure Workload Backup.

Confirmed that the SQL instance is reachable locally and via standard management tools.

Observation: Despite the official documentation stating a limit of 2,000 databases per Vault, we are seeing synchronization timeouts or metadata inconsistencies as we approach the 1,500+ mark on a single node. It seems the local metadata cache (\Metadata folder) is failing to refresh or upload the inventory correctly within the default timeout windows.

Questions:

Is there a known performance degradation or "soft limit" for the Azure Workload Backup extension when handling more than 1,500 databases on a single VM?

Are there specific Registry Keys or configuration settings to extend the discovery/handshake timeouts for high-density SQL environments?

Besides a manual purge of the local XML metadata cache, is there a recommended "forced-refresh" method to bypass stale inventory data in the Azure backend?

Any guidance on tuning the extension for high-scale environments would be greatly appreciated.I am managing an Azure VM running SQL Server with a high density of databases (approx. 1,500–1,600 DBs). We are using Azure Backup for SQL Server in Azure VMs via a Recovery Services Vault (RSV).

Issue: The SQL instance consistently shows a status of "Not Reachable" or "Incomplete" in the Azure Portal, even though the SQL Engine is healthy and the extension services are running.

Steps already taken:

Restarted the extension services: IaaSWorkloadCoordinatorService and IaasWLPluginSvc.

Triggered "Rediscover DBs" from the Recovery Services Vault multiple times.

Verified that there are no Antivirus locks on C:\Program Files\Azure Workload Backup.

Confirmed that the SQL instance is reachable locally and via standard management tools.

Observation: Despite the official documentation stating a limit of 2,000 databases per Vault, we are seeing synchronization timeouts or metadata inconsistencies as we approach the 1,500+ mark on a single node. It seems the local metadata cache (\Metadata folder) is failing to refresh or upload the inventory correctly within the default timeout windows.

Questions:

Is there a known performance degradation or "soft limit" for the Azure Workload Backup extension when handling more than 1,500 databases on a single VM?

Are there specific Registry Keys or configuration settings to extend the discovery/handshake timeouts for high-density SQL environments?

Besides a manual purge of the local XML metadata cache, is there a recommended "forced-refresh" method to bypass stale inventory data in the Azure backend?

Any guidance on tuning the extension for high-scale environments would be greatly appreciated.

Azure Backup
Azure Backup

Servicio de copias de seguridad de Azure que proporciona administración integrada a gran escala.

0 comentarios No hay comentarios

1 respuesta

Ordenar por: Muy útil
  1. Suchitra Suregaunkar 14,595 Puntos de reputación Personal externo de Microsoft Moderador
    2026-03-27T19:51:07.67+00:00

    Hello Fabián Cordero Granados

    Thank you for the detailed problem description and for outlining the troubleshooting already performed.

    After reviewing the scenario against official Microsoft documentation, there are a few important points to clarify regarding Azure Backup for SQL Server on Azure VMs.

    1. Supported database scale: Microsoft officially documents the following limit for Azure Backup of SQL Server in Azure VMs:

    “Maximum number of databases that can be protected on a server (and in a vault): 2000.”

    “Beyond this limit, performance issues may come up.”

    This limit is per SQL Server instance, not just per vault. However, Microsoft documentation does not define or publish any smaller ‘soft limit’ (for example, 1,500 databases). Any behavior observed below the 2,000‑database limit is not described as a separate threshold in official docs.

    1. Meaning of “Not Reachable” / “Incomplete” status:

    The AzureBackupWindowsWorkload extension is responsible for:

    • Discovering SQL instances and databases
    • Uploading metadata to the Recovery Services Vault
    • Maintaining communication between the VM and the vault

    If discovery or metadata sync does not complete successfully, the portal may show the SQL instance as “Not Reachable” or “Incomplete”, even when SQL Server itself is running normally.

    1. Can discovery timeouts or metadata sync be tuned?

    No. Microsoft does not provide:

    • Registry keys
    • Configuration settings
    • Supported timeout extensions

    to modify SQL discovery, handshake, or metadata upload behavior for the Azure Backup workload extension.

    The supported recovery actions are limited to:

    • Restarting the workload services (IaaSWorkloadCoordinatorService, IaasWLPluginSvc)
    • Triggering Rediscover DBs from the Recovery Services Vault

    No additional forced refresh or backend reset mechanism is supported.

    1. Recommended supported resolution:

    Because Azure Backup for SQL relies on full database discovery at the instance level, Microsoft guidance focuses on architecture, not tuning:

    Recommended approach:

    • Keep individual SQL Server instances within supported operational scale
    • Distribute very large numbers of databases across multiple SQL instances or SQL VMs
    • Reduce database density per instance if discovery failures persist

    If the environment design requires a single SQL instance with a very high number of databases and the instance remains stuck in Not Reachable or Incomplete state despite supported recovery steps, the next action is to open a Microsoft Support ticket.

    This allows the Azure Backup engineering team to:

    Thanks,

    Suchitra.

    ¿Le ha resultado útil esta respuesta?


Su respuesta

Las respuestas pueden ser marcadas como "Aceptadas" por el autor de la pregunta y "Recomendadas" por los moderadores, lo que ayuda a los usuarios a saber que la respuesta ha resuelto el problema del autor.