Dynamic Mapping SOAP API XML data to parquet file

Rajesh Jyothi 1 Reputation point
2022-11-28T07:08:39.72+00:00

Hi everyone,

I have a question regarding SOAP API dynamic mapping into sink parquet.

I have a lookup activity that reads data fields iteratively and pass to the SOAP API Request body (dynamically). And the sink is parquet file format.
While writing the data:

  • If the SOAP API returns multiple records (more than one) the data is perfectly read into data lake storage as a parquet file.
  • If the SOAP API return only one record, does not read into data lake storage. I noticed that, when I access the mapping in copy activity, select the collection reference as Table array, I get data with multiple records. And the data which returns only one record will be ignored, vise versa.
    I have attached sample SOAP API output in XML file format for better understanding.

Happy to discuss further.
Please let me know if anyone have any thoughts regarding the same.

Thank you in advance.264615-capture.png

264509-failed.xml264510-success.xml

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,161 Reputation points
    2022-12-01T05:24:12.34+00:00

    Hello @Rajesh Jyothi and welcome to Microsoft Q&A.

    The effect you see is a result of the ambiguousness when converting from XML to JSON. It has nothing to do with SOAP.

    You can follow along with an online converter to better understand.

    When we have XML like

    <box>  
        <record> data1 </record>  
        <record> data2 </record>  
    </box>  
    

    it gets converted to JSOn like

    {  
       "box": {  
          "record": [  
             "data1",  
             "data2"  
          ]  
       }  
    }  
    

    However when we have data like

    <box>  
        <record> data1 </record>  
    <box>  
    

    it gets converted to

    {  
       "box": {  
          "record": "data1"  
       }  
    }  
    

    There isn't any way to say we want

    <box>  
        <record> data1 </record>  
    <box>  
    

    to become

    {  
       "box": {  
          "record": [  
             "data1"  
          ]  
       }  
    }  
    

    Both look the same in XML.

    So why not go the other way? (explained below)

    {  
       "box": {  
          "record": "data1",  
          "record: "data2"  
       }  
    }  
    

    The above is invalid JSON because keys must be unique. The syntax to retrieve data1 is exactly the same to retrieve data2. This is a problem, so the solution is to merge the two and turn it into "record": ["data1","data2"] . Thus how we get our format.

    So there isn't a good solution I can see. XML doesn't indicate whether an element should be looked at as a property, or a collection of 1.
    The collection reference expects to work on an array, and when it isn't an array things break down like you saw. The collection reference happens after the conversion, so we can't put smarter logic in the mapping to change between property and collection on the fly.

    If we could tell the XML reader ahead of time how to interpret the expected structure, and always make something an array/collection, we would have a solution.

    If you have another perspective, I'd love to hear it, maybe your perspective can help find a solution not obvious to me.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.