is there any limitation of writing billions of record from Azure Data bricks to Azure SQL using JDBC

manish verma 481 Reputation points
2020-10-08T14:37:03.543+00:00

Hi All,

we are follow up this url-
https://learn.microsoft.com/en-us/azure/databricks/data/data-sources/sql-databases

we want to know is there any limitation of writing billions of record from Azure Data bricks to Azure SQL using JDBC

our requirement is very sample we want write data frame data to azure sql table

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,222 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. HimanshuSinha-msft 19,476 Reputation points Microsoft Employee
    2020-10-08T23:39:27.717+00:00

    Hello @manish verma ,

    Thanks for the ask and using the forum .

    Well the Adb is architechtured is such a way that they can read lot of data at the same time using executors . While writing the data on the SQL side i think you should plan better and implement partitions , otherwise while inserting huge data in the same table at the same time can turn out to be a nightmare . There is no limitation as such .

    Thanks Himanshu
    Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members


  2. David Browne - msft 3,851 Reputation points
    2020-10-09T19:10:35.64+00:00

    There are purpose-built Spark Connectors for SQL Server and Azure SQL Database and for Azure Synapse SQL Pools (SQL DW).

    You should use these for large loads, rather than the generic JDBC Spark Connector. But the JDBC Connector will work; it just may be slower than the purpose-built ones.


  3. Ragunathan Ramanujam 6 Reputation points
    2020-10-11T12:48:47.307+00:00

    Hi @Verma, Manish Kumar /@manish verma

    With all due respect, I would like to share some of my thoughts about your requirements. I am understanding that your requirement is reading billion records(Assuming source data coming in File) from the source using Databricks and write into SQL server Hyper scale.
    I could not able to see any limitations on the JDBC driver supporting writing billion records as the issues starts to occur when you could not able to fit all of your source records into your Databricks Spark Cluster Memory and writing technique used .As a developer we need to come out with better design patterns to handle large volumes data inside Databricks. One of the better design pattern @HimanshuSinha-msft provided is partitioning the data and this will split your data into multiple partitions and also improve the writing speed as each partition data will be written independently.

    Again with due respect, I am not Microsoft employee but have not seen release of any partial tested components by Microsoft without giving relevant documentation about it. We all will try to help if you provide what error is occurring when you are loading large volumes using Databricks and could able to find out the better design approach .


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.