Azure Data Factory HTTP connector not returning XML output

Koen van Wielink 16 Reputation points
2021-09-23T06:48:54.223+00:00

For one of our clients we are trying to extract data from a somewhat old fashioned REST API which returns an XML response. We have already discovered that the REST API connector cannot be used for this purpose as it can only handle JSON responses. We therefore switched to the HTTP connector instead which is giving back data, which we were then able to write to an Azure SQL DB sink. The strange thing is that the ADF output is always in JSON, despite the API response being XML (confirmed in Postman).

At first we didn't think this was a problem but I just discovered that array handling is not being done correctly. If multiple records are returned everything is fine, but when only a single record is returned by the API the square brackets around the array are removed and ADF no longer recognizes it correctly in the mapping because it expects an array. As a result the single record is not written to the SQL database.

We have tried using the Content-Type: application/xml and Accept: application/xml header to force an XML output, but that also didn't work. As the API really returns XML, it seems that ADF is converting this to JSON on its own accord and not doing it correctly. Is there any way that we can handle an XML API in ADF using standard connectors? We don't have programmers on our team who can create custom connectors so that's not an option. Any insights would be greatly appreciated.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,004 questions
0 comments No comments
{count} vote

2 answers

Sort by: Most helpful
  1. Koen van Wielink 16 Reputation points
    2021-09-24T08:12:59.873+00:00

    Good morning Sathyamoorthy,

    Thanks for the lengthy response. Unfortunately this solution won't work for us. The first problem is that we have multiple nested arrays within the datasets, all of which could present the same problem. It is not feasible to create separate success/failure flows for each of the arrays in all of our datasets.
    Secondly the funny thing is that the flow does not fail, it just doesn't write any records. So even if we were to set up the success/failure flows as described by you, it would never reach the failure path as no error is generated.
    I'm afraid that it's not going to work to read this API directly from ADF. As such I've spent last night setting up a logic app that does the same, and which will write the XML output to files into an ADLSv2 container. From there I should hopefully be able to process the XML files correctly via ADF.

    Best regards,

    Koen

    1 person found this answer helpful.

  2. svijay-MSFT 5,241 Reputation points Microsoft Employee
    2021-09-24T07:33:58.69+00:00

    Hello @Koen van Wielink

    Welcome to the Microsoft Q&A platform.

    You are right that the XML that has been read is converted to JSON and is used by the service. You could use @xml() to convert string to XML representation - that may not meet your goal - as it cannot be a directly consumed by other activities.

    134900-image.png

    I had done small repro at my end, and I was able to replicate the behavior you have observed.

    For Testing Purpose, I had created two XML for input.

    134935-image.png

    I was able to overcome the behavior by having two copy activities with different mappings - one for multiple rows and one for single rows.

    134962-image.png

    For XML1 - Schema : BookStore --> Book[]
    You will have to create explicit Mapping for the Copy Activity 1 ( Multiple rows as per Above screenshot).
    [ You can do Import Schema in both scenarios - But you should be able to get the Multiple rows and Single row response to do it automatically. ]

    134850-image.png

    The above is just an illustration per my source. You could modify as per your XML Structure

    Note : Book here is an array

    Now for the Copy Activity 2 ( "Single Row") - This executes on the failure of the Copy Activity 1 (Multiple Rows) .

    134936-on-failure.gif

    When only one row is returned - Book will be considered as an Object and not as array - Multiple Rows Copy Activity fails as it cannot find the Book [] . Single Row Copy Activity with the below mapping executes.

    For XML2 - JSON Schema -Bookstore --> Book

    134877-image.png

    Note: Book here is an object.

    Execution :

    1. When Multiple Rows are Returned as a result of HTTP Connector - Multiple Rows Copy Activity succeeds and Single Row Copy Activity is ignored
      134955-image.png
    2. When Single Row is Returned as a result of HTTP Connector - Multiple Rows Copy Activity fails because of schema mapping and Single Row Copy Activity is executed
      134899-image.png

    In both the cases the row(s) are written to the same Sink (SQL DB in your Case)

    Hope this helps. Do let us know if you any further queries.

    --------------------------------------------------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.