azure synapse copy -: xml Copying from on-premises to azure data lake having null values for CDATA element

Kumar, Amit 41 Reputation points
2023-06-18T07:59:41.69+00:00

Hi ,

We have differnt kpi data in form of xml. During copying these xml to azure data lake , these xml converted into json. Json files are very much perfect except issue " Null value for CDATA element"

As below sample , DN value present as CDATA element in xml but become NULL in correspsing json.

Can you please suggest some solution for this.

Note-: From Data lake , these json data would be processed by spark job and insert data in azure sql pool.

XML-:

<?xml version="1.0"?>
<OMeS>
  <PMSetup  startTime="2023-06-15T16:00:00.000+05:30:00" interval="15">
    <PMMOResult>
      <MO>
        <DN><![CDATA[PLMN-PLMN/MRBTS-1224]]></DN>
      </MO>
      <PMTarget  measurementType="SBTS_Energy_Consumption">
        <M40002C1>0</M40002C1>
        <M40002C2>5622</M40002C2>
        <M40002C0>5622</M40002C0>
      </PMTarget>
    </PMMOResult>
    <PMMOResult>
      <MO>
        <DN><![CDATA[PLMN-PLMN/MRBTS-1224/LNBTS-1224]]></DN>
      </MO>
      <PMTarget  measurementType="LTE_S1AP">
        <M8000C6>0</M8000C6>
        <M8000C7>0</M8000C7>
        <M8000C8>0</M8000C8>
        <M8000C9>0</M8000C9>
        <M8000C11>0</M8000C11>
        <M8000C13>0</M8000C13>
        <M8000C14>0</M8000C14>
        <M8000C15>0</M8000C15>
        <M8000C16>0</M8000C16>
        <M8000C26>0</M8000C26>
        <M8000C27>0</M8000C27>
        <M8000C37>0</M8000C37>
        <M8000C38>0</M8000C38>
        <M8000C39>0</M8000C39>
        <M8000C40>0</M8000C40>
        <M8000C41>0</M8000C41>
        <M8000C42>0</M8000C42>
      </PMTarget>
    </PMMOResult>
	 </PMSetup>
</OMeS>


JSON -:

{
	"OMeS": {
		"PMSetup": {
			"PMMOResult": [{
				"MO": {
					"DN": null
				},
				"PMTarget": {
					"M40002C1": 0,
					"M40002C2": 5622,
					"M40002C0": 5622,
					"@measurementType": "SBTS_Energy_Consumption"
				}
			}, {
				"MO": {
					"DN": null
				},
				"PMTarget": {
					"M8000C6": 0,
					"M8000C7": 0,
					"M8000C8": 0,
					"M8000C9": 0,
					"M8000C11": 0,
					"M8000C13": 0,
					"M8000C14": 0,
					"M8000C15": 0,
					"M8000C16": 0,
					"M8000C26": 0,
					"M8000C27": 0,
					"M8000C37": 0,
					"M8000C38": 0,
					"M8000C39": 0,
					"M8000C40": 0,
					"M8000C41": 0,
					"M8000C42": 0,
					"@measurementType": "LTE_S1AP"
				}
			}
"@startTime": "2023-06-15T16:00:00.000+05:30:00", 			"@interval": 15 		} 	}
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

Answer accepted by question author
  1. AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator
    2023-06-21T06:10:52.3633333+00:00

    Hi Kumar, Amit ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    As per my understanding regarding your query, it seems while trying to load the xml content into json file, null values are being generated for DN value of CDATA elements. Please let me know if that is not the correct interpretation.

    I tried to use mapping dataflow in order to convert the shared xml data into json and it is properly loading the data without converting it to NULL.

    Could you please try using XML connector for source dataset and JSON for sink and try loading the same using mapping dataflow and see if it works for you or not?

    Following is the generated output json :

    {
       "OMeS":{
          "PMSetup":{
             "@interval":"15",
             "@startTime":"2023-06-15T16:00:00.000+05:30:00",
             "PMMOResult":[
                {
                   "MO":{
                      "DN":"PLMN-PLMN/MRBTS-1224"
                   },
                   "PMTarget":{
                      "@measurementType":"SBTS_Energy_Consumption",
                      "M40002C0":"5622",
                      "M40002C1":"0",
                      "M40002C2":"5622"
                   }
                },
                {
                   "MO":{
                      "DN":"PLMN-PLMN/MRBTS-1224/LNBTS-1224"
                   },
                   "PMTarget":{
                      "@measurementType":"LTE_S1AP",
                      "M8000C11":"0",
                      "M8000C13":"0",
                      "M8000C14":"0",
                      "M8000C15":"0",
                      "M8000C16":"0",
                      "M8000C26":"0",
                      "M8000C27":"0",
                      "M8000C37":"0",
                      "M8000C38":"0",
                      "M8000C39":"0",
                      "M8000C40":"0",
                      "M8000C41":"0",
                      "M8000C42":"0",
                      "M8000C6":"0",
                      "M8000C7":"0",
                      "M8000C8":"0",
                      "M8000C9":"0"
                   }
                }
             ]
          }
       }
    }
    

    In case you have any obligations for not using dataflow , then I would request you to share your piece of code which you are using to convert the xml data to json so that I can repro it and see what is missing.

    Hope it helps. Kindly accept the answer if it is helpful. Thankyou.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.