nvarchar(max) issue while incremental load

Ram Babu 20 Reputation points


From salesforce to azure syanpse ded pool, full load is completed using autocreate table option. The data loaded and the nvarchar(max) updated in synapse. while incremental load the large data not supported because of nvarchar limit is 4000. Is there any other option to load more than 4k data in nvarchar while incremental load. Please advice.

salesforce have data type 'Long Text Area(131072) and data is availbale upto 40k character length in that record. Please advice to load this data in syanpse while incremental load in data factory.

Thank you

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
3,555 questions
{count} votes

1 answer

Sort by: Most helpful
  1. RevelinoB 1,950 Reputation points

    Hi Ram Babu,

    As I recall, Azure Synapse Analytics (formerly known as SQL Data Warehouse) has certain limitations when compared to traditional SQL Server, especially concerning column sizes for string data types. The NVARCHAR(MAX) type, which can store more than 2 GB of data in traditional SQL Server, has a limitation in Synapse to only support 4000 characters.

    When trying to load data from Salesforce which has a field type 'Long Text Area' that can store up to 131,072 characters, this limitation can become an issue if the data exceeds the 4000 character limit of Synapse's NVARCHAR(4000).

    Here's what you can do to load such large text fields incrementally into Synapse:

    • Use Multiple Columns: If it's feasible and acceptable, split the long text from Salesforce into multiple NVARCHAR(4000) columns in Synapse. Then, during incremental loads, break the incoming text into parts and save each part in a corresponding column.
    • Store as BLOB: Convert the long text into a binary format, like a BLOB, and store it in Synapse. Azure Synapse Analytics supports VARBINARY(MAX), which can be used to store large amounts of binary data. When you need the data, you would then convert it back from binary to text.
    • Azure Blob Storage as an Intermediate: Use Azure Blob Storage as an intermediate storage. During your incremental load:
    • Save the long text data from Salesforce into a Blob in Azure Blob Storage.
    • Store a reference to that Blob (e.g., the Blob URL or Blob name) in the Synapse table.
    • When you need to access the data, you can retrieve the entire text content from the Blob using the reference.
    • Use a Different Type of Data Storage: If maintaining data fidelity is of utmost importance and you foresee frequent encounters with such large text data, you might want to reconsider using Synapse for this specific dataset. You can utilize Azure SQL Database or other big data storage solutions that support large text fields natively.
    • Compression: This is a bit advanced, but if you're willing, you can compress the text data (maybe using a technique like GZIP) and then store the compressed binary data in a VARBINARY(MAX) column. This won't be directly readable without decompression, but it will allow you to store a significant amount of text data, especially if the text is repetitive or has patterns that make it compressible.
    • Feedback to Microsoft: Azure Synapse Analytics and Azure Data Factory are products that are continuously evolving. If you feel a particular feature or support is missing, it's always good to provide feedback directly to Microsoft. They often prioritize features based on user feedback. Bare in mind, that whatever solution you choose, make sure it aligns well with your downstream processing and access patterns. If you're frequently querying or updating these large text fields, you'll want to ensure that the additional complexity doesn't adversely affect performance or complicate your architecture.

    I hope this answers your query?

    0 comments No comments