Using Snowflake as a Source Dataset in Azure Data Factory without Blob Storage

Question

Using Snowflake as a Source Dataset in Azure Data Factory without Blob Storage

Gauri Joshi 0

In Azure Data Factory, a copy data activity is being utilized with a Snowflake source dataset and an ADLS destination. However, there are challenges when configuring the source dataset to work without Azure Blob Storage.

Enabling staging causes compatibility issues with GEN2 storage, while disabling staging leads to failures unless Azure Blob Storage is used. What are the best practices or solutions to use a Snowflake dataset directly as a source without relying on Blob storage?

1 answer

Your answer

Answer 1

Vinodh247 34,661 MVP Volunteer Moderator

Hi ,

Thanks for reaching out to Microsoft Q&A.

Why Blob Storage Is Typically Required?

When staging is enabled, ADF uses Snowflake’s COPY INTO with external stage, which requires Blob Storage, and does not natively support ADLS Gen2 as a staging location.

When staging is disabled, ADF falls back to row-by-row JDBC-based extraction, which is very slow and prone to timeouts or failures with large datasets.

suggestions & best practices:

Use JDBC-based copy (disable staging) for small/medium datasets

In the source dataset:

Set "Use Staging" = false.

This works without Blob Storage, but is only suitable for:

  Smaller datasets (up to few million rows depending on the size).
  
     Lower throughput needs.

Limitations:

Slower due to rowby-row reads.

No parallelism.

May hit timeout for large datasets.

Use SHIR with PolyBase Disabled

If you use a SHIR for your copy activity:

Set staging = false.

This avoids Azure hosted IR bottlenecks and gives better throughput than row-by-row copy.

Still no Blob needed, and performance improves over default auto-resolve IR.

Use Azure Blob Storage only as temp staging and purge automatically

If performance is critical, and you can temporarily accept Blob Storage:

Enable staging and configure Blob as staging.

Let ADF copy data via COPY INTO using external stage.

Use the “delete after copy” option to purge temp files.

Pros:

Very fast.

Suitable for large datasets.

Cons:

Requires temporary use of Blob Storage.

Slightly higher operational complexity.

Alternative: Use Data Flows (if you want to avoid staging entirely)

Mapping Data Flows can connect directly to Snowflake via JDBC.

Can read from Snowflake and write to ADLSGen2 without external staging.

Behind the scenes, Data Flows use Spark-based transformation logic.

Slower for bulk data than COPY INTO, but no need for Blob.

Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

Gauri Joshi 0 Reputation points

2025-06-17T13:39:05+00:00

In my scenario we cant use the blob storage . and JDBC based copy activity will not possible as it will have huge data
Venkat Reddy Navari 2,885 Reputation points Microsoft External Staff Moderator

2025-06-18T09:59:55.4866667+00:00
Hi Gauri Joshi Thanks for the follow-up and for explaining your constraints that helps clarify the scenario a lot.

Mapping Data Flows (No Blob, Scales Better Than JDBC Copy)

In your case, the best option within Azure Data Factory is likely Mapping Data Flows. Unlike the regular copy activity, Data Flows use Spark under the hood, which means they can handle larger datasets more efficiently — and without requiring Blob Storage for staging.

You can connect directly to Snowflake via JDBC in the source and write to ADLS Gen2 on the sink side. Since it’s not using Snowflake’s COPY INTO under the hood, you don’t need to configure external staging. It’s a bit slower than COPY INTO for massive loads, but it should still perform reasonably well especially if you tune partitioning and other performance settings.

(Advanced) External Stage to ADLS Gen2 from Snowflake

This one depends on your Snowflake edition and region but just to put it out there: Snowflake has started to support ADLS Gen2 as an external stage (using OAuth). If that’s available in your environment, you could technically export data from Snowflake to ADLS Gen2 outside of ADF and then use ADF just to pick up the files.

I get that this adds complexity, so it’s more of a workaround if you already have experience managing Snowflake external stages.

Option to Explore: Synapse Pipelines

If your environment supports it, Synapse Pipelines might offer a little more flexibility (since it’s Spark-native as well). The core engine is similar to ADF, but you might have more tuning knobs for performance at scale.

ADF’s native performance features (like staging with COPY INTO) are tied to Blob Storage. So, when that’s off the table, the options are fewer. But Mapping Data Flows should be a strong starting point, especially if you’re moving a large dataset and want to stay entirely within ADF without staging.

Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Venkat Reddy Navari 2,885 Reputation points Microsoft External Staff Moderator

2025-06-19T09:33:42.9733333+00:00

Hi Gauri Joshi Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Using Snowflake as a Source Dataset in Azure Data Factory without Blob Storage

1 answer

Your answer