Unable to write the data into the DWH (dedicated sql pool) from databricks when we are loading the data into the table when the column is greater than 8000

MadhuVamsi-2459 32 Reputation points
2022-09-22T06:37:58.98+00:00

Hi team ,

we are writing the data into the DWH from databricks after reading into the dataframe.

We are creating an empty table with the datatypes and columns matching to the dataframe but when the column length is greater than 8000 then we are not able to load that particular column , when we drop that column from the dataframe and load then it is working fine , Please let me know in case if there is any other way and in case if I am doing any thing mismatch.

@PRADEEPCHEEKATLA-MSFT

Thanks and Regards
Bannuru Sri Madhu Vamsi

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,364 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,917 questions
SQL Server Migration Assistant
SQL Server Migration Assistant
A Microsoft tool designed to automate database migration to SQL Server from Access, DB2, MySQL, Oracle, and SAP ASE.
494 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 76,921 Reputation points Microsoft Employee
    2022-10-10T04:06:34.57+00:00

    Hello @MadhuVamsi-2459 ,

    Apologies for the delay in response.

    This is a problem that that goes back 15 or more years. Reviewing your table design may be more effective.
    When designing your table, you want to aim for rows less than 8,060 bytes to fit on a page.
    if you exceed this IN_ROW_DATA it needs to use the ROW_OVERFLOW_DATA Allocation unit.

    You need to use a type of varchar(max), nvarchar(max), varbinary(max).

    The 2GB limit is per row. According to the docs, the total size of a columnstore table is unlimited... so a table with 80GB should be fine.

    I would suggest you try a different approach. Trying to shove that volume of data through an ancient JDBC connection will be troublesome. The recommended pattern for moving data from Databricks to Azure Synapse is to use the Azure Synapse Dedicated SQL Pool Connector for Apache Spark.

    This will then create a separate page for the larger column but keep a 24-byte pointer in the original page.

    I would start with looking at the physical data length of the data in the column and set the type appropriately if you can.

    Hope this helps.