Writing from ADLS CSV to Synapse Dedicated SQL Pool - Unexpected error encountered filling record reader buffer: HadoopSqlException: Arithmetic overflow error converting expression to data type NVARCHAR

Brandon Duncan 0 Reputation points
2023-06-05T20:35:10.9266667+00:00

We are copying over email records from Dataverse (D365) using Synapse Link for Dataverse which copies incremental changes to ADLS and then to a Dedicated SQL pool in Synapse using a data flow.

The email body can be incredibly long, and our table in the sql pool accounts for this by using a field type of nvarchar(max). However, when copying this data over using a Data Flow in Synapse, we encounter the following error

Unexpected error encountered filling record reader buffer: HadoopSqlException: Arithmetic overflow error converting expression to data type NVARCHAR

If we truncate the email body to 10,000 characters, the copy performs perfectly fine.

User's image

Is there a way to copy the entire email body over without having to restrict the length? I understand that Polybase previously had a limitation of 1MB and Hadoop still has that limitation. Is this something that we are stuck with or do we need to research other methods of copying over this data?
User's image

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,337 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,358 questions
{count} votes

2 answers

Sort by: Most helpful
  1. ShaikMaheer-MSFT 37,896 Reputation points Microsoft Employee
    2023-06-07T08:52:44.3033333+00:00

    Hi Brandon Duncan,

    Thank you for posting query in Microsoft Q&A Platform.

    The error message you are seeing indicates that the data you are trying to copy over is too large for the nvarchar(max) field in your SQL table. This is likely due to a limitation in the Hadoop file format used by Synapse Link for Dataverse.

    One possible solution to this issue is to split the email body into smaller chunks and store them in separate rows in your SQL table. You can use a data flow in Synapse to split the email body into chunks of a certain size (e.g. 10,000 characters) and then write each chunk to a separate row in your SQL table.

    To do this, you can use the substring function in a derived column transformation to split the email body into chunks. For example, you can use the following expression to split the email body into chunks of 10,000 characters:

    substring(email_body, (i - 1) * 10000 + 1, 10000)
    

    where i is a variable that increments for each chunk.

    You can then use a sink transformation to write each chunk to a separate row in your SQL table.

    Another possible solution is to use a different file format that supports larger data sizes, such as Parquet or ORC. These file formats are supported by Synapse Link for Dataverse and can handle larger data sizes than the Hadoop file format.

    I hope this helps you find a solution to your issue. Let me know how it goes or if any further queries. Thank you.

    1 person found this answer helpful.

  2. Brandon Duncan 0 Reputation points
    2023-10-04T14:48:29.9533333+00:00

    For anyone looking at this later, I did the following to resolve the issue of getting large email bodies into Synapse Dedicated SQL:

    1. Copy email records to parquet in Datalake using Dataflow
    2. Perform insert into staging table in Synapse Dedicated SQL using the Copy Command in a Copy Activity
    3. Execute a stored procedure that does a merge of the staging table into the target table
    MERGE dbo.email AS T
    USING stage.stage_email AS S
    ON (T.id = S.id)
    WHEN MATCHED THEN
        UPDATE SET
            T.description = S.description,
    		T.safedescription = S.safedescription;
    
    0 comments No comments