Hello @Anonymous ,
There are multiple methods to load Delta format data from ADLS to Synapse. The optimal, performant, and cost-effective approach to load Delta format data from ADLS to Synapse depends on your specific requirements and use case.
Here are some of them.
PolyBase External Tables: PolyBase External Tables allow you to create an external table in Synapse using PolyBase to connect to the Delta files in ADLS. You can then query the external table in Synapse using T-SQL statements.
This approach is cost-effective since it doesn't require any data movement. However, performance may be slower compared to other options since the data is read directly from the ADLS files.
Pros:
- No data movement, which can reduce costs
- Simple to set up and use
- Suitable for simple use cases where performance is not critical
Cons:
- It may not be as performant as other options since the data is read directly from ADLS files
- Limited functionality compared to other options
Reference document: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview
Azure Data Factory: Azure Data Factory can be used to copy data from ADLS to Synapse. ADF provides a Delta format connector that you can use to copy data in Delta format from ADLS to Synapse.
This approach allows for more flexibility since you can transform and clean the data during the copy process. However, it involves data movement, which can increase costs.
Pros:
- Flexible and customizable
- Suitable for complex use cases with large amounts of data
- Can transform and clean data during the copy process
Cons:
- Involves data movement, which can increase costs
- May have higher latency compared to other options
Reference document: https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow-delta-lake
Azure Databricks: Azure Databricks can be used to load data from Delta files in ADLS to Synapse. This involves using the Azure Synapse Spark connector to write data to Synapse.
This highly performative approach allows for transformations and data processing using Spark. However, it may be more expensive compared to other options since it involves running a Databricks cluster.
Pros:
- Highly performant
- Allows for complex transformations and processing of data using Spark
- Suitable for large-scale data processing
Cons:
- More expensive compared to other options since it involves running a Databricks cluster
- It may have a steeper learning curve compared to other options
Reference document: https://learn.microsoft.com/en-us/azure/databricks/delta/
Azure Synapse Studio: Azure Synapse Studio is an integrated workspace that provides a unified experience for developing and managing big data and data warehousing solutions. Azure Synapse Studio can load data in Delta format from ADLS to Synapse.
You can either use the built-in data ingestion tools or write custom code to load the data. This approach provides a user-friendly interface and allows for customization, but it may not be as performant as using Azure Databricks.
Each method has its own pros and cons, and the optimal method depends on your specific use case and requirements.
Pros:
- User-friendly interface
- Allows for customization
- Suitable for simple use cases
Cons:
- It may not be as performant as other options
- Limited functionality compared to other options
Reference document: https://learn.microsoft.com/en-us/azure/synapse-analytics/quickstart-load-studio-sql-pool
I hope this helps. Please let me know if you have any further questions,