Why does Data Factory Salesforce Connector not pull correct column formats?

Question

We had a data factory pipeline connecting to salesforce using integration runtime 4.9 and everything worked perfect. We upgraded our integration runtime to 5.1 because of a sql server connector issue and now our salesforce connector does not work properly. For example, when I pull the calendar table from salesforce, the ID comes in as nvarchar(max) instead of nvarchar(18). All nvarchar columns are now created as nvarchar(max) instead of the correct lengths.

I use a lookup and pull 400+ tables names from a control table and then loop over those names and pull the data from salesforce and place into sql server. No mappings required. It creates the 400 tables each night. After creation, the last step runs a stored procedure to replace indexes.

I was told to create all 400+ tables manually and then do a truncate and load that way. However, anytime a column is added/changed/deleted then the process would fail.

I just upgraded to 5.2 yesterday and the problem still exist. It works perfect in 4.8 and 4.9.

Does anyone have a work around for this? I hope i'm not the only person pulling salesforce data into sql server.

I do not want 400 pipelines to replace something i did with one pipeline and 3 objects.

Answer

Hello Blasko, Dan W,

Welcome to the Microsoft Q&A forum.

Are you upgraded to the new Salesforce connector?

I don't believe the issue you're facing is due to the integration runtime upgrade. After an internal discussion, it has been confirmed that this behavior is by design.

When auto-creating tables, the SQL sink uses nvarchar(max) as the default value for non-SQL family source string columns. This is because, when dealing with external data, it is difficult to determine the optimal size for these columns, and using max ensures that all possible data sizes are accommodated.

Workaround:

use existing table or Implement a pre-copy script that creates the tables with reasonable column lengths before the data is loaded.

Answer

I've got the same issue. It took a day or two to do a workaround and to pull schema from SF via REST API. Then I'm creating the tables in Synapse with that schema and trying to pull the data from SF. But even I'm explicitly defining the schema of the table and the query with the same columns defined - by the way removing SELECT * also seems to be a huge step back - the copy activity still fails for some tables and succeed for other tables. I just don't understand.

Answer

Hello @Blasko, Dan W

The issue you are facing is related to the upgrade of the integration runtime from version 4.9 to version 5.1. It is possible that the upgrade caused changes in the behavior of the Salesforce connector, resulting in the issues you are experiencing.

One possible workaround for this issue is to use a mapping data flow in your pipeline to transform the data from Salesforce before loading it into SQL Server. This would allow you to specify the correct data types for each column and ensure that the data is loaded correctly.

You can also use the mapping data flow to perform any necessary data transformations or aggregations. Another option is to use a stored procedure to create the tables dynamically based on the metadata retrieved from Salesforce.

This would allow you to ensure that the tables are created with the correct data types and lengths.

You can then use a bulk insert or other method to load the data into the tables. It is also possible that there is a bug in the Salesforce connector in version 5.1 and 5.2 of the integration runtimes.

I hope this helps.

Answer

Hi Everyone,

I created another workaround with Web, Script and Copy activities:

https://adamsperspectiveondata.com/2024/07/01/upgrade-pipeline-to-new-salesforce-connector/

Share via

Why does Data Factory Salesforce Connector not pull correct column formats?

5 answers

Your answer