Transform data from an SAP ODP source using the SAP CDC connector in Azure Data Factory or Azure Synapse Analytics

APPLIES TO: Azure Data Factory Azure Synapse Analytics

This article outlines how to use mapping data flow to transform data from an SAP ODP source using the SAP CDC connector. To learn more, read the introductory article for Azure Data Factory or Azure Synapse Analytics. For an introduction to transforming data with Azure Data Factory and Azure Synapse analytics, read mapping data flow.

Tip

To learn the overall support on SAP data integration scenario, see SAP data integration using Azure Data Factory whitepaper with detailed introduction on each SAP connector, comparsion and guidance.

Supported capabilities

This SAP CDC connector is supported for the following capabilities:

Supported capabilities IR
Mapping data flow (source/-) ①, ②

① Azure integration runtime ② Self-hosted integration runtime

This SAP CDC connector leverages the SAP ODP framework to extract data from SAP source systems. For an introduction to the architecture of the solution, read Introduction and architecture to SAP change data capture (CDC) in our SAP knowledge center.

The SAP ODP framework is contained in most SAP NetWeaver based systems, including SAP ECC, SAP S/4HANA, SAP BW, SAP BW/4HANA, SAP LT Replication Server (SLT), except very old ones. For prerequisites and minimum required releases, see Prerequisites and configuration.

The SAP CDC connector supports basic authentication or Secure Network Communications (SNC), if SNC is configured.

Prerequisites

To use this SAP CDC connector, you need to:

Get started

To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:

Create a linked service for the SAP CDC connector using UI

Follow the steps described in Prepare the SAP CDC linked service to create a linked service for the SAP CDC connector in the Azure portal UI.

Dataset properties

To prepare an SAP CDC dataset, follow Prepare the SAP CDC source dataset.

Transform data with the SAP CDC connector

SAP CDC datasets can be used as source in mapping data flow. Since the raw SAP ODP change feed is difficult to interpret and to correctly update to a sink, mapping data flow takes care of this by evaluating technical attributes provided by the ODP framework (e.g., ODQ_CHANGEMODE) automatically. This allows users to concentrate on the required transformation logic without having to bother with the internals of the SAP ODP change feed, the right order of changes, etc.

Mapping data flow properties

To create a mapping data flow using the SAP CDC connector as a source, complete the following steps:

  1. In ADF Studio, go to the Data flows section of the Author hub, select the button to drop down the Data flow actions menu, and select the New data flow item. Turn on debug mode by using the Data flow debug button in the top bar of data flow canvas.

    Screenshot of the data flow debug button in mapping data flow.

  2. In the mapping data flow editor, select Add Source.

    Screenshot of add source in mapping data flow.

  3. On the tab Source settings select a prepared SAP CDC dataset or select the New button to create a new one. Alternatively, you can also select Inline in the Source type property and continue without defining an explicit dataset.

    Screenshot of the select dataset option in source settings of mapping data flow source.

  4. On the tab Source options select the option Full on every run if you want to load full snapshots on every execution of your mapping data flow, or Full on the first run, then incremental if you want to subscribe to a change feed from the SAP source system. In this case, the first run of your pipeline will do a delta initialization, which means it will return a current full data snapshot and create an ODP delta subscription in the source system so that with subsequent runs, the SAP source system will return incremental changes since the previous run only. You can also do incremental changes only if you want to create an ODP delta subscription in the SAP source system in the first run of your pipeline without returning any data, and with subsequent runs, the SAP source system will return incremental changes since the previous run only. In case of incremental loads it is required to specify the keys of the ODP source object in the Key columns property.

    Screenshot of the run mode property in source options of mapping data flow source.

    Screenshot of the key columns selection in source options of mapping data flow source.

  5. For the tabs Projection, Optimize and Inspect, please follow mapping data flow.

  6. If Run mode is set to Full on every run or Full on the first run, then incremental, the tab Optimize offers additional selection and partitioning options. Each partition condition (the screenshot below shows an example with two conditions) will trigger a separate extraction process in the connected SAP system. Up to three of these extraction process are executed in parallel.

    Screenshot of the partitioning options in optimize of mapping data flow source.