Using Snowflake as a Source Dataset in Azure Data Factory without Blob Storage

Gauri Joshi 0 Reputation points
2025-06-17T08:43:10.1866667+00:00

In Azure Data Factory, a copy data activity is being utilized with a Snowflake source dataset and an ADLS destination. However, there are challenges when configuring the source dataset to work without Azure Blob Storage.

Enabling staging causes compatibility issues with GEN2 storage, while disabling staging leads to failures unless Azure Blob Storage is used. What are the best practices or solutions to use a Snowflake dataset directly as a source without relying on Blob storage?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 34,661 Reputation points MVP Volunteer Moderator
    2025-06-17T08:59:44.3166667+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    Why Blob Storage Is Typically Required?

    • When staging is enabled, ADF uses Snowflake’s COPY INTO with external stage, which requires Blob Storage, and does not natively support ADLS Gen2 as a staging location.

    When staging is disabled, ADF falls back to row-by-row JDBC-based extraction, which is very slow and prone to timeouts or failures with large datasets.


    suggestions & best practices:

    1. Use JDBC-based copy (disable staging) for small/medium datasets

    In the source dataset:

    Set "Use Staging" = false.

    This works without Blob Storage, but is only suitable for:

      Smaller datasets (up to few million rows depending on the size).
      
         Lower throughput needs.
         
    

    Limitations:

    • Slower due to rowby-row reads.

    No parallelism.

    May hit timeout for large datasets.

    1. Use SHIR with PolyBase Disabled
    • If you use a SHIR for your copy activity:

    Set staging = false.

    • This avoids Azure hosted IR bottlenecks and gives better throughput than row-by-row copy.

    Still no Blob needed, and performance improves over default auto-resolve IR.

    1. Use Azure Blob Storage only as temp staging and purge automatically

    If performance is critical, and you can temporarily accept Blob Storage:

    Enable staging and configure Blob as staging.

    Let ADF copy data via COPY INTO using external stage.

    Use the “delete after copy” option to purge temp files.

    Pros:

    Very fast.

    Suitable for large datasets.

    Cons:

    Requires temporary use of Blob Storage.

    Slightly higher operational complexity.

    1. Alternative: Use Data Flows (if you want to avoid staging entirely)

    Mapping Data Flows can connect directly to Snowflake via JDBC.

    • Can read from Snowflake and write to ADLSGen2 without external staging.

    Behind the scenes, Data Flows use Spark-based transformation logic.

    • Slower for bulk data than COPY INTO, but no need for Blob.

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.