Share via

Synapse dedicated SQL pool external table failure with COPY schema discovery error after restart

Michael Clemans 140 Reputation points
2026-05-13T18:01:24.42+00:00

Hello,

We encountered an issue where external Parquet-based tables in a dedicated SQL pool started failing with the following error:

COPY statement input file schema discovery failed: Cannot process the file https://

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.


2 answers

Sort by: Most helpful
  1. SAI JAGADEESH KUDIPUDI 3,465 Reputation points Microsoft External Staff Moderator
    2026-05-13T19:27:33.9533333+00:00

    Hi Michael Clemans,

    Yes, transient Azure service or networking outages can sometimes trigger this behavior in Synapse Dedicated SQL Pool external table access.

    External tables in dedicated SQL pools rely on multiple backend components working together, including:

    PolyBase / COPY engine services

    Data Movement Service (DMS)

    ADLS Gen2 connectivity

    Authentication/token validation

    Internal Azure networking between Synapse compute nodes and storage

    If there is a temporary disruption in any of those layers — for example a regional networking issue, storage-access interruption, backend service update, or transient platform outage — the SQL pool may enter a degraded state where external file reads start failing with errors like:

    “COPY statement input file schema discovery failed … file could not be opened.”

    In some cases, the affected pool does not automatically recover its storage connectivity/session state even after the underlying issue clears. Restarting or pause/resume forces the dedicated SQL pool to reinitialize its compute nodes, PolyBase services, and storage connections, which is why the issue immediately resolves afterward.

    Your observation that:

    the always-running pool was affected, while

    the periodically restarted pool was not

    also aligns with this type of transient runtime-state issue.

    Additionally, not every transient backend issue or short-duration platform disruption results in a public Azure status announcement. Some temporary service-side or networking issues may self-recover quickly or affect only a subset of infrastructure, so customers can sometimes observe intermittent failures even when there is no active public incident posted.

    At this time there is no public Microsoft documentation confirming a direct relationship between the Microsoft 365 incident and Synapse connectivity, but temporary Azure infrastructure or networking disruptions can indirectly impact external table access behavior.

    Relevant documentation:

    https://learn.microsoft.com/azure/synapse-analytics/sql/develop-tables-external-tables

    https://learn.microsoft.com/azure/synapse-analytics/known-issues

    https://learn.microsoft.com/sql/t-sql/statements/copy-into-transact-sql?view=azure-sqldw-latest

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    Was this answer helpful?


  2. AI answer

    2026-05-13T18:01:49.3233333+00:00

    The behavior described is consistent with a transient issue in the dedicated SQL pool engine or its connectivity to storage, rather than a permanent problem with the external table definitions or Parquet files.

    From the available guidance:

    • External tables over files in Azure Storage (Blob/ADLS) are known to be sensitive to storage access and network conditions. When storage is protected by firewalls or network rules, failures like “file cannot be opened” or “directory cannot be listed” are explicitly called out as connectivity issues between Synapse and the storage account, not schema or metadata problems.
    • For some service‑side issues, Microsoft explicitly recommends pausing and resuming (effectively restarting) the dedicated SQL pool to restore normal operation. For example, when the Distribution_nn_Cache database cannot be opened due to inaccessible files or insufficient memory/disk, the mitigation is to pause and resume the pool, which resets the service state.
    • There is also a general pattern in Synapse known issues and troubleshooting guidance where pausing/resuming is used to clear internal state or recover from transient service health issues.

    Given that:

    1. Multiple external tables failed with the same schema discovery / file open error.
    2. No configuration changes were made to the external tables or storage.
    3. The issue disappeared immediately after restarting the dedicated SQL pool.

    The most likely explanations, based on documented patterns, are:

    • A transient service health or engine state issue in the dedicated SQL pool (for example, an internal cache or metadata/database state problem similar in nature to the Distribution_nn_Cache issue) that was cleared by the restart.
    • A temporary connectivity or network path problem between the dedicated SQL pool and the ADLS Gen2 account, which was resolved when the pool was paused/resumed and re‑established its connections.

    Why a restart helps in these scenarios:

    • Pausing/resuming a dedicated SQL pool tears down and reinitializes the compute nodes, caches, and internal databases. This can:
      • Clear corrupted or stale internal state related to external table metadata or file handles.
      • Re‑establish network connections and authentication paths to external storage.
      • Recover from transient service health issues that affect access to internal or external storage.

    Is it transient engine behavior or storage connectivity?

    • The error message “file could not be opened” and the fact that other pools and the storage account continued to work point toward transient engine or connectivity behavior, not a persistent storage configuration problem.
    • The documented mitigations for similar “inaccessible files” / service health issues explicitly include pausing and resuming the dedicated SQL pool, which aligns with what resolved the issue in this case.

    In summary, this pattern is aligned with known transient issues in Synapse dedicated SQL pools where pausing/resuming is the recommended recovery step. It is typically attributed to temporary engine/service health or storage connectivity conditions rather than permanent schema or configuration problems.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.