ADF Copy activity with PolyBase column size limit (8000 bytes)

Robert Kostecki 6 Reputation points
2020-10-14T12:16:02.553+00:00

There seems to be an error when loading columns with sizes greater than 8000 bytes via ADF Copy with Polybase and Synapse.

Error message:

{  

    "errorCode": "2200",  

    "message": "ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Please make sure SQL DW has access to ADLS Gen2 account,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopSqlException: String or binary data would be truncated.,Source=.Net SqlClient Data Provider,SqlErrorNumber=107090,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=107090,State=1,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopSqlException: String or binary data would be truncated.,},],'",  

    "failureType": "UserError",  

    "target": "DeltaCopyToStaging",  

    "details": []  

}  

To replicate:

  1. Execute the following script in Synapse:

DROP TABLE dbo.BigTable
GO

CREATE TABLE dbo.BigTable (C1 VARCHAR(max))
WITH (
distribution = round_robin
,heap
)
GO

DECLARE @Steinar VARCHAR(4001) = 'x';

SET @Steinar = @Steinar + replicate('x', 4001)

INSERT INTO dbo.BigTable
VALUES (@Steinar );

UPDATE dbo.BigTable
SET C1 = C1 + C1

SELECT len(C1)
FROM dbo.BigTable --> length 8002

  1. Export data from the table to parquet file in ADLS Gen2 using ADF Copy Activity (Synapse => Parquet file).
  2. Import data from parquet file generated in step to back to the table using ADF Copy Activity and setting the PolyBase method.

Can you please investigate and (ideally) provide a fix or workaround?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,142 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,136 questions
0 comments No comments
{count} vote

2 answers

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 73,886 Reputation points Microsoft Employee
    2020-10-15T11:50:12.017+00:00

    Hello @Robert Kostecki ,

    You are experiencing this error message because the length of the column size defined.

    PolyBase supports a maximum column size of varchar(8000), nvarchar(4000), or varbinary(8000). If you have data that exceeds these limits, one option is to break the data up into chunks when you export it, and then reassemble the chunks after import.

    As per the documentation: char and varchar (Transact-SQL)

    varchar [ ( n | max ) ] Variable-size string data. Use n to define the string size in bytes and can be a value from 1 through 8,000 or use max to indicate a column constraint size up to a maximum storage of 2^31-1 bytes (2 GB).

    If you use char or varchar, we recommend to:

    • Use char when the sizes of the column data entries are consistent.
    • Use varchar when the sizes of the column data entries vary considerably.
    • Use varchar(max) when the sizes of the column data entries vary considerably, and the string length might exceed 8,000 bytes.

    For more details, refer Load data into Azure Synapse

    Hope this helps. Do let us know if you any further queries.

    ------------

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.

  2. Robert Kostecki 6 Reputation points
    2020-10-20T09:15:14.307+00:00

    Hi Pradeep, thank you for a quick reply. I am afraid that the information provided in quotes is incorrect. There is no such limitation in the PolyBase feature documentation polybase-versioned-feature-summary

    You may notice that the limitation states: "The maximum possible row size, which includes the full length of variable length columns, can't exceed 32 KB in SQL Server or 1 MB in Azure Synapse Analytics."

    Furthermore it can be empirically proven that the PolyBase can successfully load columns exceeding the 8000 bytes limit and specifically max columns designed to overcome the 8000 bytes limitation.

    Bearing in mind the above can you please re-evaluate the bug and let me know what the issue is?

    0 comments No comments