question

MohammedArshadAlikhan-9926 avatar image
0 Votes"
MohammedArshadAlikhan-9926 asked KranthiPakala-MSFT commented

Question on Copy Activity in ADF and another one on Polybase

  1. What is the example case where we need to use 'Enable staging' in the settings of Copy Activity in ADF? I would need few simple examples to understand the significance of it

  2. What is a Polybase and its significance.


azure-data-factory
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @MohammedArshadAlikhan-9926,

Just checking in to see if the below information was helpful. If this answers your query, do click Accept Answer and Up-Vote for the same. And, if you have any further query do let us know.

0 Votes 0 ·

1 Answer

KranthiPakala-MSFT avatar image
0 Votes"
KranthiPakala-MSFT answered

Hello @MohammedArshadAlikhan-9926,

Thanks for the question and using MS Q&A platform.


What is the example case where we need to use 'Enable staging' in the settings of Copy Activity in ADF? I would need few simple examples to understand the significance of it

When you want to load large amount of data to a datawarehouse there you will see performance and throughput issues and also data format issues. In order to improve performance and throughput staging is needed. Staging is especially useful in the following cases:

  • Use Case1: When you want to ingest data from various data stores into Azure Synapse Analytics via PolyBase, copy data from/to Snowflake, or ingest data from Amazon Redshift/HDFS performantly.
    - Use Staged Copy by PolyBase to load data into Azure Synapse Analytics - If your source data store and format isn't originally supported by PolyBase, use the Staged copy by using PolyBase feature instead. The staged copy feature also provides you better throughput. It automatically converts the data into PolyBase-compatible format, stores the data in Azure Blob storage, then calls PolyBase to load data into Azure Synapse Analytics.
    - Staged copy to Snowflake : When your source data store or format is not natively compatible with the Snowflake COPY command, as mentioned in the last section, enable the built-in staged copy using an interim Azure Blob storage instance. The staged copy feature also provides you better throughput. The service automatically converts the data to meet the data format requirements of Snowflake. It then invokes the COPY command to load data into Snowflake. Finally, it cleans up your temporary data from the blob storage.

  • Use Case2: When you don't want to open ports other than port 80 and port 1443 in your firewall because of corporate IT policies. For example, when you copy data from an on-premises data store to an Azure SQL Database or an Azure Synapse Analytics, you need to activate outbound TCP communication on port 1433 for both the Windows firewall and your corporate firewall. In this scenario, staged copy can take advantage of the self-hosted integration runtime to first copy data to a staging storage over HTTP or HTTPS on port 443, then load the data from staging into SQL Database or Azure Synapse Analytics. In this flow, you don't need to enable port 1433.

  • Use Case3: Sometimes it takes a while to perform a hybrid data movement (that is, to copy from an on-premises data store to a cloud data store) over a slow network connection. To improve performance, you can use staged copy to compress the data on-premises so that it takes less time to move data to the staging data store in the cloud. Then you can decompress the data in the staging store before you load into the destination data store.


    What is a Polybase and its significance.

  • Polybase is a transparent access layer that facilitates connectivity between the database engine and external data sources containing unstructured or semi-structured data. PolyBase is optimized for data warehouse workloads and analytical query processing, making it easier than ever to merge big data into the SQL Server universe. It is a technology that accesses and combines both non-relational and relational data, all from within SQL Server.

  • The significance of the availability of PolyBase in PDW is the ability to combine both relational and nonrelational data into a single result set, but there are several others.

  • Another benefit is faster results from queries to HDFS. PolyBase is able to perform read and write operations in parallel much faster by taking advantage of the massively parallel processing (MPP) of PDW

  • We can query the data on Hadoop using TSQL over SQL Server or PDW.

  • We can query the data on Azure Blob Storage using TSQL over SQL Server.

  • We can import data to SQL Server through Hadoop, Azure Blob Storage, or Azure Data Lake Store.

  • We can export data from SQL Server to Hadoop, Azure Blob Storage, or Azure Data Lake Store.

  • We can use PolyBase with third party tools supported by Microsft BI or SQL Server.

  • We can use PolyBase to access Oracle,Teradata or MongoDB


Here are related helpful documents:

  1. Staged copy to Snowflake

  2. Use PolyBase to load data into Azure Synapse Analytics

  3. Performance feature - Staged Copy

  4. What is Polybase?

  5. Polybase for beginners.

Hope this will help.


  • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification

  • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.