Synapse Link vs Data Factory vs Polybase

Question

Synapse Link vs Data Factory vs Polybase

kosmos 246

My objective is to build a good automated ETL process to load data into Synapse DWH - Data Vault model.

My data is stored in Cosmos DB and Azure SQL DB

There are many options to load data into Synapse DWH:

Polybase: used for data virtualization. Not for job scheduling, orchestration, logging, multiple loads... Therefore not interesting for my use case
Data Factory: pipeline orchestration
Synapse Link: direct connection with Synapse DWH

At a first glance, I would say Synapse Link (Cosmos & Azure SQL DB) are the way to go. But I wonder in which case I would then use Data Factory.

Thanks in advance

PRADEEPCHEEKATLA 90,651 Reputation points Moderator

2022-10-10T03:53:49.22+00:00
Hello @kosmos ,

Following up to see if the below suggestion was helpful. And, if you have any further query do let us know.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.
kosmos 246 Reputation points

2022-10-10T07:17:52.13+00:00

Hi @ PRADEEPCHEEKATLA-MSFT,

What it is not clear to me 100% is if Synapse Link is as well recommended for traditional data warehouse ELT processes. If I understood correctly it is designed for near-real time data.
Would then Synapse Link be a good practice to be used for one-a-day data ingestions into the data warehouse ?

Thanks in advance!
PRADEEPCHEEKATLA 90,651 Reputation points Moderator

2022-10-12T10:50:34.97+00:00

Hello @kosmos ,

You can still use Synapse link for one-a-day data ingestions if neededed. But Synapse Link is most recommended for real time data ingestions.

By using Azure Synapse Link for SQL, you can get the following benefits: https://learn.microsoft.com/en-us/azure/synapse-analytics/synapse-link/sql-synapse-link-overview#benefit

Hope this helps.
kosmos 246 Reputation points

2022-10-12T10:55:11.907+00:00

I guess using Synapse link for no-real time scenarios is not the most cost-efficient way of loading the data into a datawarehouse right ?
While the copy activities are

Accepted answer

0 additional answers

Your answer

PRADEEPCHEEKATLA 90,651 Reputation points Moderator

2022-10-10T03:53:49.22+00:00

Hello @kosmos ,

Following up to see if the below suggestion was helpful. And, if you have any further query do let us know.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.
kosmos 246 Reputation points

2022-10-10T07:17:52.13+00:00

Hi @ PRADEEPCHEEKATLA-MSFT,

What it is not clear to me 100% is if Synapse Link is as well recommended for traditional data warehouse ELT processes. If I understood correctly it is designed for near-real time data.
Would then Synapse Link be a good practice to be used for one-a-day data ingestions into the data warehouse ?

Thanks in advance!
PRADEEPCHEEKATLA 90,651 Reputation points Moderator

2022-10-12T10:50:34.97+00:00

Hello @kosmos ,

You can still use Synapse link for one-a-day data ingestions if neededed. But Synapse Link is most recommended for real time data ingestions.

By using Azure Synapse Link for SQL, you can get the following benefits: https://learn.microsoft.com/en-us/azure/synapse-analytics/synapse-link/sql-synapse-link-overview#benefit

Hope this helps.
kosmos 246 Reputation points

2022-10-12T10:55:11.907+00:00

I guess using Synapse link for no-real time scenarios is not the most cost-efficient way of loading the data into a datawarehouse right ?
While the copy activities are

Answer 1

Hello @kosmos ,

Thanks for the question and using MS Q&A platform.

Since you are data is stored in Cosmos DB and Azure SQL DB.

Azure Data Factory and Synapse pipelines support three ways to load data into Azure Synapse Analytics.

Use COPY statement
Use PolyBase
Use bulk insert

Note: The fastest and most scalable way to load data is through the COPY statement or the PolyBase.

The two options labeled “Polybase” and the “COPY command” are only applicable to Azure Synapse Analytics. They are both fast methods of loading which involve staging data in Azure storage (if it’s not already in Azure Storage) and using a fast, highly parallel method of loading to each compute node from storage. Especially on large tables these options are preferred due to their scalability but they do come with some restrictions documented at the link above.

In contrast, on Azure Synapse Analytics a bulk insert is a slower load method which loads data through the control node and is not as highly parallel or performant. It is an order of magnitude slower on large files. But it can be more forgiving in terms of data types and file formatting.

On other Azure SQL databases, bulk insert is the preferred and fast method.

For more details, refer to Azure Synapse Analytics as sink.

Synapse Link brings together Azure Cosmos DB/SQL analytical store with Azure Synapse Analytics runtime support. This integration enables you to build cloud native HTAP (Hybrid transactional/analytical processing) solutions that generate insights based on real-time updates to your operational data over large datasets. It unlocks new business scenarios to raise alerts based on live trends, build near real-time dashboards, and business experiences based on user behavior.

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is jhow you can be part of Q&A Volunteer Moderators

kosmos 246 Reputation points

2022-09-20T07:56:10.567+00:00

Hi @PRADEEPCHEEKATLA ,

Thank you for your quick answer.

I saw you did not mention synapse link as a possibility to load data into Synapse. Is there any particular reason ?

Thank you
PRADEEPCHEEKATLA 90,651 Reputation points Moderator

2022-09-21T06:31:59.997+00:00

Hello @kosmos ,

If you look at end of the answer - it talks about the synapse link and the purpose of it.
kosmos 246 Reputation points

2022-09-21T06:42:37.217+00:00

Hi @PRADEEPCHEEKATLA ,
Yes that I saw, but you did not include it in the Synapse loading possibilities. I would like to understand if the Synapse link is as well a recommended method to ingest data into a datawarehouse

Thank you
kosmos 246 Reputation points

2022-09-23T08:04:14.903+00:00

Thank you@PRADEEPCHEEKATLA ,
If I understand then correctly, Synapse link is not thought to be used for traditional Data Warehouse ingestions, but rather for very particular use cases related to near-real time data analytics.
PRADEEPCHEEKATLA 90,651 Reputation points Moderator

2022-09-26T05:14:45.397+00:00

Hello @kosmos ,

Yes, your understanding is correct.

Azure Synapse Link for SQL enables near real time analytics over operational data in Azure SQL Database or SQL Server 2022. With a seamless integration between operational stores including Azure SQL Database and SQL Server 2022 and Azure Synapse Analytics, Azure Synapse Link for SQL enables you to run analytics, business intelligence and machine learning scenarios on your operational data with minimum impact on source databases with a new change feed technology.

Share via

Synapse Link vs Data Factory vs Polybase

0 additional answers

Your answer