Azure Data Factory - Copy activity JSON array with dynamic collection reference

Jos Neutel 21 Reputation points
2022-03-01T14:22:16.907+00:00

Hello,

I'm stuck trying to extract a JSON response from an API call with a dynamic collection reference.
The collection reference contains the value (which I also want to extract) which is interpreted as the name of the collection but it can vary.

Below an example of a possible JSON response:

{
"10180": [
{
"origin": "m",
"status": "v",
"timestamp": 1596232800,
"value": 851.6355
}
]
},
{
"10280": [
{
"origin": "m",
"status": "v",
"timestamp": 1596232800,
"value": 30
}
]
}

The numbers represents channels, but are not named like "channelnumber": "10280". Instead it only contains the channelnumbers to which the childarray belongs.
Not every API response contains the same channelnumbers and there are like 50 possible channels.

I hope someone understands what I'm trying to tell here. The output I would like to have contains: channelnumber, origin, status, timestamp, value
At the moment I'm using a copy data activity to reach my goal, without success.

Any help or tips are welcome.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator
    2022-03-04T13:40:43.347+00:00

    Hi @Jos Neutel ,

    Thank you for posting query in Microsoft Q&A Platform and thanks for helping on clarifications.

    As I understand ask here, you are trying to load your json data in to parquet file but making sure to create a new key called channelnumber on existing json. Correct me if it is wrong.

    You can do that, by looping through your input json array to take value and then inside the loop use copy activity. Under source of copy activity use some SQL resource dataset so that you can make use of power of SQL engine to convert your json in to desired format and then load that in to some table in DB. Finally once all data loads to DB, use another copy activity outside of loop to take that data and load as parquet in final destination.

    Please check below screenshots and details to make sense above suggested implementation.

    In below example I am passing your input json as parameter value. In your case its from REST API.
    Step1: Pipeline parameter(array type) which holds input json array.
    180108-image.png

    Step2: Pass step1 parameter to Foreach activity to loop through on each item.
    180181-image.png

    Step3: Inside Foreach activity, Take First item for json array in to variable.
    180137-image.png

    Step4: Inside Foreach activity, Copy activity. In copy activity under source use some SQL dataset and use query option. Write query that transform json data as we needed.
    180165-image.png

    Code used in above screenshot.

    DECLARE @json NVARCHAR(MAX);  
    SET @json = N'@{variables('jsondata')}';  
      
    SELECT [key] as channelnumber,   
     JSON_VALUE([value],'$[0].origin') as origin,    
      JSON_VALUE([value],'$[0].status') as [status],    
       JSON_VALUE([value],'$[0].timestamp') as [timestamp],    
        JSON_VALUE([value],'$[0].value') as [value]   
    FROM OPENJSON(@json,'$')  
    

    Please check below screenshot where similar query I ran in SQL for your reference.
    180191-image.png

    Hope this helps. Please let us know if any further queries.

    -------------

    Please consider hitting Accept Answer button. Accepted answers helps community as well.


1 additional answer

Sort by: Most helpful
  1. Jos Neutel 21 Reputation points
    2022-03-07T15:47:05.177+00:00

    Thank you for your extended reply!

    I haven't been able to pull it off following your steps.
    Main reason being the web activity output not able to isolate the pure JSON output from the request. Whenever I pass activity('web activity').output it passes the whole output, but it only needs to pass the JSON returned by the web activity. Whenever I try activity('web activity').output.value it starts to say that it doesn't exist and I can choose to use for example "61820". So were back at square one I think.

    I end up sinking the requests to JSON in the data lake > copy to SQL database (map complex values to string) > run stored procedure which uses some kind of the basis SQL you provided to turn the JSON into a viable table format.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.