Share via

Synapse Mapping Data Flow Sink creates unexpected empty 0-byte files when writing Parquet to Azure Blob Storage

Shinde, Dushyant 0 Reputation points
2026-02-02T15:23:04.52+00:00

Issue Summary :

When using Azure Synapse Analytics Mapping Data Flow to write an updated Parquet file from one Blob Storage container to another, the Sink creates:

  1. The expected Parquet file inside the correct folder structure :
    • <container>/<customer_no>/<month_folder>/<file>.parquet
  2. Unexpected 0-byte blobs at intermediate folder levels:
    • <container>/<customer_no>
    • <container>/<customer_no>/<month_folder>

These appear in the Azure Portal as duplicate folders + empty files with the same names as folder names. These extra blobs should NOT be created.


Environment Details :

  • Service : Azure Synapse Analytics
  • Component : Mapping Data Flow (ADF/Synapse)
  • Compute : AutoResolveIntegrationRuntime (Data Flow runtime)
  • Source Storage : Azure Blob Storage
  • Target Storage : Azure Blob Storage
  • File format : Parquet
  • Sink settings :
    • Sink type : Integration dataset (Parquet)
    • File name option : "Output to single file"
    • File name : Passed dynamically through parameter (original file name)
    • Directory : Dynamically constructed (customer_no / month_folder)
    • Partitioning : Single partition

Expected Behavior :

  • Only the updated Parquet file should be written :
enriched-metric-reports/<customer_no>/<month_folder>/<original_file_name>.parquet
  • No additional empty blobs or marker files should be created.

Actual Behavior :

Synapse Data Flow writes:

1. Expected file :

Correct Parquet file appears under :

enriched-metric-reports/341/January-2026/<original_file_name>.parquet

2. Unexpected empty files (0 KB) :

These are created automatically :

enriched-metric-reports/341 (0 bytes)

enriched-metric-reports/341/January-2026 (0 bytes)

Azure Portal shows:

  • Folder 341
  • File 341 (0 bytes)
  • Folder January-2026
  • File January-2026 (0 bytes)

These blobs should not be generated.


We need clarification from Microsoft Azure Synapse team whether this behavior is:

  1. Expected,
  2. A known limitation, or
  3. A bug in Synapse Data Flow Sink for Blob Storage.

Attachments :

User's image

User's image

User's image User's image

User's image

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.

0 comments No comments

1 answer

Sort by: Most helpful
  1. Pilladi Padma Sai Manisha 9,055 Reputation points Microsoft External Staff Moderator
    2026-02-02T15:58:10.3133333+00:00

    Hi Shinde, Dushyant,
    Thank you for reaching Microsoft Q&A! and for the detailed screenshots and configuration. What you are observing is expected behavior when Azure Synapse Mapping Data Flow writes a single Parquet file to Azure Blob Storage using a dynamic folder path and an integration dataset.

    Azure Blob Storage does not have a true hierarchical file system. The folders shown in the portal are only a visual interpretation of blob names that contain /. When Mapping Data Flow (Spark runtime) writes to Blob with:

    Dynamic directory path (for example: customer_no/month_folder)

    File name option = Output to single file

    Integration dataset sink

    Parquet format

    the Spark commit protocol creates 0-byte path marker blobs at each directory level before committing the final file. These blobs have the same names as the folder segments, which is why the portal shows them as both a folder and a 0-byte file.

    Example of what Spark writes internally:

    enriched-metric-reports/341                      ← marker blob (0 bytes)
    enriched-metric-reports/341/January-2026        ← marker blob (0 bytes)
    enriched-metric-reports/341/January-2026/file.parquet
    

    This is by design and comes from the Spark FileOutputCommitter behavior when writing to non-hierarchical storage such as Azure Blob. It is not a Synapse defect and does not indicate data corruption.

    This behavior does not occur when:

    Writing to ADLS Gen2 (true hierarchical namespace)

    Using an inline dataset sink

    Allowing Spark to write multiple partition files instead of a single file

    The 0-byte blobs are expected Spark path markers required for committing the file to Azure Blob Storage and can be safely ignored. If you want to avoid these marker blobs entirely, the recommended approach is to use ADLS Gen2 or change the sink configuration (inline dataset or multi-file output).

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.