Help with derived columns please

sam nick 366 Reputation points
2021-12-07T05:37:33.537+00:00

Hello,
In my incoming feed, i have an array and i'm loading the fields into my sql DB as below
155553-image.png

Under the CurrentAssetMax field , the initial values within the "square brackets" are the same as ID and then the individual values in the next square bracket set and followed by their respective values in the last square brackets. For ID = 2678734, the DOCID value in the second value within the last square bracket. i.e if it comes the third, then the third value is the corresponding value.

My end goal is to achieve the below.
155349-image.png

I was thinking of using a derived column in the dataflow using instr, substring etc. But unable to get the right formula. Please can you help me with the formula (or) a better recommendation to get the end result

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,161 Reputation points
    2021-12-07T21:27:49.973+00:00

    Hello @sam nick and welcome back again. Here are my initial thoughts (not final solution).

    I see the format is of [numbers I don't care about] [comma separated tags] [numbers of same order of tags]

    So the first step is to isolate each collection (stuff in brackets).
    Second step is to discard the first collection
    Third step is to either loop over the length of the remaining collection, or iterate over their items.
    During the loop/iteration we need to somehow put the values into new columns.

    For the first step, I am thinking of splitting on ][ and then cleaning up the remaining ]. I would use dropright function to remove the last ] character.
    For the third step I am thinking of using maploop( length of tags collection).

    If the number of tags is fixed and small, a simpler brute-force solution is possible.
    Similar beginning steps, separating and discarding first collection.
    Then do something like "if column name is found in tag collection, then return value at the index it is found in tags"

    I'll work out the particular details today or tomorrow.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.