Fast Data Loading in Azure SQL DB using Azure Databricks
Azure Databricks and Azure SQL database can be used amazingly well together. This repo will help you to use the latest connector to load data into Azure SQL as fast as possible, using table partitions and column-store and all the known best-practices.
- Partitioned Tables and Indexes
- Columnstore indexes: Overview
- Columnstore indexes - Data loading guidance
Samples
All the samples start from a partitioned Parquet file, created with data generated from the famous TPC-H benchmark. Free tools are available on TPC-H website to generate a dataset with the size you want:
Once the Parquet file is available,
the samples will guide you through the most common scenarios
- Loading a non-partitioned table
- Loading a partitioned table
- Loading a partitioned table via switch-in
all samples will also show how to correctly load table if there are already indexes or if you want to use a column-store in Azure SQL.
Bonus Samples: Reading data as fast as possible
Though this repo focuses on writing data as fast as possible into Azure SQL, I also understand that you may also want to know how to do the opposite: how the read data as fast as possible from Azure SQL into Apache Spark / Azure Databricks? For this reason in the folder notebooks/read-from-azure-sql
you will find two samples that shows how to do exactly that:
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.