Writing from ADLS CSV to Synapse Dedicated SQL Pool - Unexpected error encountered filling record reader buffer: HadoopSqlException: Arithmetic overflow error converting expression to data type NVARCHAR

Question

We are copying over email records from Dataverse (D365) using Synapse Link for Dataverse which copies incremental changes to ADLS and then to a Dedicated SQL pool in Synapse using a data flow.

The email body can be incredibly long, and our table in the sql pool accounts for this by using a field type of nvarchar(max). However, when copying this data over using a Data Flow in Synapse, we encounter the following error

Unexpected error encountered filling record reader buffer: HadoopSqlException: Arithmetic overflow error converting expression to data type NVARCHAR

If we truncate the email body to 10,000 characters, the copy performs perfectly fine.

User's image

Is there a way to copy the entire email body over without having to restrict the length? I understand that Polybase previously had a limitation of 1MB and Hadoop still has that limitation. Is this something that we are stuck with or do we need to research other methods of copying over this data?
User's image

Answer

Hi Brandon Duncan,

Thank you for posting query in Microsoft Q&A Platform.

The error message you are seeing indicates that the data you are trying to copy over is too large for the nvarchar(max) field in your SQL table. This is likely due to a limitation in the Hadoop file format used by Synapse Link for Dataverse.

One possible solution to this issue is to split the email body into smaller chunks and store them in separate rows in your SQL table. You can use a data flow in Synapse to split the email body into chunks of a certain size (e.g. 10,000 characters) and then write each chunk to a separate row in your SQL table.

To do this, you can use the substring function in a derived column transformation to split the email body into chunks. For example, you can use the following expression to split the email body into chunks of 10,000 characters:

substring(email_body, (i - 1) * 10000 + 1, 10000)

where i is a variable that increments for each chunk.

You can then use a sink transformation to write each chunk to a separate row in your SQL table.

Another possible solution is to use a different file format that supports larger data sizes, such as Parquet or ORC. These file formats are supported by Synapse Link for Dataverse and can handle larger data sizes than the Hadoop file format.

I hope this helps you find a solution to your issue. Let me know how it goes or if any further queries. Thank you.

Answer

For anyone looking at this later, I did the following to resolve the issue of getting large email bodies into Synapse Dedicated SQL:

Copy email records to parquet in Datalake using Dataflow
Perform insert into staging table in Synapse Dedicated SQL using the Copy Command in a Copy Activity
Execute a stored procedure that does a merge of the staging table into the target table

MERGE dbo.email AS T
USING stage.stage_email AS S
ON (T.id = S.id)
WHEN MATCHED THEN
    UPDATE SET
        T.description = S.description,
		T.safedescription = S.safedescription;

Writing from ADLS CSV to Synapse Dedicated SQL Pool - Unexpected error encountered filling record reader buffer: HadoopSqlException: Arithmetic overflow error converting expression to data type NVARCHAR

2 answers