Limitations for Salesforce ingestion connections
Important
LakeFlow Connect is in gated Public Preview. To participate in the preview, contact your Databricks account team.
This article lists limitations and considerations for connecting to and ingesting data from Salesforce using LakeFlow Connect.
When you run a scheduled pipeline, alerts don’t trigger immediately. Instead, they trigger when the next update runs.
When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with Delta Live Tables behavior.
When you select a schema, you are selecting not only the
CURRENT
tables but also theFUTURE
tables that get added to it.During maintenance periods, Databricks might not be able to access your data.
If the source table name conflicts with an existing destination table name, the flow fails.
There is a maximum of 250 objects per pipeline. However, there is no known limit on the number of rows or columns that are supported within these objects.
base64
,address
,location
, orcomplexValue
types are not supported. These columns are automatically pruned during ingestion.The following objects are not supported:
- Objects with
WHERE
clauses orLIMIT
restrictions - Objects for real-time event monitoring
- Objects ending with
__b
,__x
, or__hd
.
- Objects with
Only one destination per pipeline is supported. All tables must have the same catalog and schema.
Databricks select a cursor column to track the incremental reading of data and sequence changes that share the same primary key. Databricks selects the cursor column from the following list, in order of preference:
SystemModstamp
,LastModifiedDate
,CreatedDate
, andLoginTime
.For example, if
SystemModstamp
is unavailable, Databricks looks forLastModifiedDate
. If an object doesn’t have one of these cursors, Databricks takes a snapshot of the table.Databricks automatically transforms Salesforce data types to Delta-compatible data types. For the approximate list of transformations, see Automatic data transformations.
Databricks can ingest formula fields. But at this time, Databricks requires a full snapshot of these fields. This means that pipeline latency depends on whether your Salesforce data includes formula fields and the volume of updates in your Salesforce data.
Databricks run formula fields at the same cadence as the rest of the pipeline. However, within the cadence of your pipeline updates, the non-formula fields might be updated earlier than the formula fields.
Databricks treat deletions the same as inserts and updates. When a row is deleted from Salesforce, it is deleted from the bronze table at the next sync of the data. For example, suppose you have a pipeline running hourly. If you sync at 12:00 PM, then have a deleted record at 12:30 PM, the deletion won’t be reflected until the 1:00 PM sync occurs.
There is one edge case: If the pipeline didn’t run after the records were deleted but before they were purged from Salesforce’s recycling bin, Databricks misses those deletes. The only way to recover from this is with a full refresh.
Salesforce has a limit of 4 connections per user per connected app. If you need to create more than 4 connections, create a new user.
For security purposes, only authenticate if you clicked an OAuth link in the Azure Databricks UI.
Salesforce allows you to rotate a refresh token. The connector doesn’t support this.
Deletions
- If the table is enabled for soft deletion:
- The deletion is reflected in the next sync.
- If the table does not support soft deletion:
- The connector automatically batches snapshot for ingestion
- Exception: This doesn’t work for history objects. The connector assumes that these are incrementally read and that no deletions happen.
- If the table supports soft delete and the user hard-deletes things manually, the connector cannot capture the deletion. The record stays in the bronze table unless you perform a full-refresh.