Issue with Leading Spaces in Key Columns Using ADF SAP CDC Connector
Problem Statement:
I'm utilizing Azure Data Factory (ADF) with the SAP CDC connector to extract data from several tables in SAP ECC and SAP HANA. I encountered a significant issue regarding leading spaces in key columns during data extraction.
Here’s a detailed description of the problem:
Scenario:
When extracting data, the source system contains values for the MATNR field as "ABC" and "ABC ". The first value ("ABC") doesn't have a leading space, while the second value ("ABC ") has a leading space.
Issue:
During the data extraction process via ADF mapping dataflows and the SAP CDC connector, the data is written to a staging text file. However, the leading spaces in the field values are trimmed automatically by the connector or the data flows. As a result, both values are transformed to "ABC" in the staging file.
Consequences:
Duplication Issues: When the affected field is a key column, this eradicates the ability to distinguish between the records, leading to duplicates and incorrect data representation.
Example:
Source System Data:
Record 1: MATNR = "ABC"
Record 2: MATNR = "ABC "
Extracted Data in Staging File:
Record 1: MATNR = "ABC"
Record 2: MATNR = "ABC"
Request:
I'm seeking help or suggestions on how to configure the ADF SAP CDC connector or mapping dataflows to preserve leading spaces during the extraction process. If there are any techniques or best practices to handle this scenario effectively, please share them.
Things I've Tried:
Checking configuration options in the SAP CDC connector.
Using transformations in ADF mapping dataflows.
Unfortunately, none of these approaches seem to prevent the automatic trimming of leading spaces.
Additional Information:
Tools Used: Azure Data Factory, SAP CDC Connector, Mapping Dataflows
Source Systems: SAP ECC, SAP HANA
Target File Type: Text File in staging and parquet in sink
Any guidance or solutions to maintain the integrity of key columns during extraction would be greatly appreciated.