Ingesting data at scale

Large-scale data movement is an area to address on many projects for hot and cold data paths, regardless of the diverse intent it might serve: copying/loading data or data transfers (copying and deleting the data in the source).

When transferring large volumes of data for analytics or migrations, polyglot persistence is a core requirement when it comes to cost, performance, flexibility, and retention. That means that the same solution might include different data storage and engine choices to address different requirements of the data lifecycle. For example, when data is archived or retained longer for compliance reasons, less expensive storage choices can be used. Additionally, deciding if and how fast that data can be made available might influence the storage and database engine or compute choices.

Explore tech-specific samples

This section summarizes a list of resources and assets that have successfully addressed the transfer of large volumes of data.

Ingest large-scale data from Azure Data Lake to Azure SQL using Databricks

The following resources showcase how to use the Apache Spark Connector for SQL Server and Azure SQL for large-scale data loading.

Load data fast in Azure SQL using Smart Bulk Copy

For more information