Transform data from an SAP ODP source using the SAP CDC connector in Azure Data Factory or Azure Synapse Analytics
APPLIES TO: Azure Data Factory Azure Synapse Analytics
This article outlines how to use mapping data flow to transform data from an SAP ODP source using the SAP CDC connector. To learn more, read the introductory article for Azure Data Factory or Azure Synapse Analytics. For an introduction to transforming data with Azure Data Factory and Azure Synapse analytics, read mapping data flow.
To learn the overall support on SAP data integration scenario, see SAP data integration using Azure Data Factory whitepaper with detailed introduction on each SAP connector, comparsion and guidance.
This SAP CDC connector is supported for the following capabilities:
|Mapping data flow (source/-)||①, ②|
① Azure integration runtime ② Self-hosted integration runtime
This SAP CDC connector leverages the SAP ODP framework to extract data from SAP source systems. For an introduction to the architecture of the solution, read Introduction and architecture to SAP change data capture (CDC) in our SAP knowledge center.
The SAP ODP framework is contained in most SAP NetWeaver based systems, including SAP ECC, SAP S/4HANA, SAP BW, SAP BW/4HANA, SAP LT Replication Server (SLT), except very old ones. For prerequisites and minimum required releases, see Prerequisites and configuration.
The SAP CDC connector supports basic authentication or Secure Network Communications (SNC), if SNC is configured.
To use this SAP CDC connector, you need to:
Set up a self-hosted integration runtime. The most recent version can be found in Microsoft Download Center. For more information, see Create and configure a self-hosted integration runtime.
Download the 64-bit SAP Connector for Microsoft .NET 3.0 from SAP's website, and install it on the self-hosted integration runtime machine. During installation, make sure you select the Install Assemblies to GAC option in the Optional setup steps window.
The SAP user who's being used in the SAP table connector must have the permissions described in User Configuration:
To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:
- The Copy Data tool
- The Azure portal
- The .NET SDK
- The Python SDK
- Azure PowerShell
- The REST API
- The Azure Resource Manager template
Create a linked service for the SAP CDC connector using UI
Follow the steps described in Prepare the SAP CDC linked service to create a linked service for the SAP CDC connector in the Azure portal UI.
To prepare an SAP CDC dataset, follow Prepare the SAP CDC source dataset.
Transform data with the SAP CDC connector
SAP CDC datasets can be used as source in mapping data flow. Since the raw SAP ODP change feed is difficult to interpret and to correctly update to a sink, mapping data flow takes care of this by evaluating technical attributes provided by the ODP framework (e.g., ODQ_CHANGEMODE) automatically. This allows users to concentrate on the required transformation logic without having to bother with the internals of the SAP ODP change feed, the right order of changes, etc.
Mapping data flow properties
To create a mapping data flow using the SAP CDC connector as a source, complete the following steps:
In ADF Studio, go to the Data flows section of the Author hub, select the … button to drop down the Data flow actions menu, and select the New data flow item. Turn on debug mode by using the Data flow debug button in the top bar of data flow canvas.
In the mapping data flow editor, select Add Source.
On the tab Source settings select a prepared SAP CDC dataset or select the New button to create a new one. Alternatively, you can also select Inline in the Source type property and continue without defining an explicit dataset.
On the tab Source options select the option Full on every run if you want to load full snapshots on every execution of your mapping data flow, or Full on the first run, then incremental if you want to subscribe to a change feed from the SAP source system. In this case, the first run of your pipeline will do a delta initialization, which means it will return a current full data snapshot and create an ODP delta subscription in the source system so that with subsequent runs, the SAP source system will return incremental changes since the previous run only. You can also do incremental changes only if you want to create an ODP delta subscription in the SAP source system in the first run of your pipeline without returning any data, and with subsequent runs, the SAP source system will return incremental changes since the previous run only. In case of incremental loads it is required to specify the keys of the ODP source object in the Key columns property.
For the tabs Projection, Optimize and Inspect, please follow mapping data flow.
If Run mode is set to Full on every run or Full on the first run, then incremental, the tab Optimize offers additional selection and partitioning options. Each partition condition (the screenshot below shows an example with two conditions) will trigger a separate extraction process in the connected SAP system. Up to three of these extraction process are executed in parallel.
Submit and view feedback for