Hi ,
Thanks for reaching out to Microsoft Q&A.
Why Blob Storage Is Typically Required?
- When staging is enabled, ADF uses Snowflake’s
COPY INTO
with external stage, which requires Blob Storage, and does not natively support ADLS Gen2 as a staging location.
When staging is disabled, ADF falls back to row-by-row JDBC-based extraction, which is very slow and prone to timeouts or failures with large datasets.
suggestions & best practices:
- Use JDBC-based copy (disable staging) for small/medium datasets
In the source dataset:
Set "Use Staging" = false.
This works without Blob Storage, but is only suitable for:
Smaller datasets (up to few million rows depending on the size).
Lower throughput needs.
Limitations:
- Slower due to rowby-row reads.
No parallelism.
May hit timeout for large datasets.
- Use SHIR with PolyBase Disabled
- If you use a SHIR for your copy activity:
Set staging = false.
- This avoids Azure hosted IR bottlenecks and gives better throughput than row-by-row copy.
Still no Blob needed, and performance improves over default auto-resolve IR.
- Use Azure Blob Storage only as temp staging and purge automatically
If performance is critical, and you can temporarily accept Blob Storage:
Enable staging and configure Blob as staging.
Let ADF copy data via COPY INTO
using external stage.
Use the “delete after copy” option to purge temp files.
Pros:
Very fast.
Suitable for large datasets.
Cons:
Requires temporary use of Blob Storage.
Slightly higher operational complexity.
- Alternative: Use Data Flows (if you want to avoid staging entirely)
Mapping Data Flows can connect directly to Snowflake via JDBC.
- Can read from Snowflake and write to ADLSGen2 without external staging.
Behind the scenes, Data Flows use Spark-based transformation logic.
- Slower for bulk data than
COPY INTO
, but no need for Blob.
Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.