byName function to select column value from a json dataset in dataflow

Victor Brito 1 Reputation point
2020-10-21T14:45:26.043+00:00

Hi,

I have a dataset with json data as source in a dataflow and want to add a derived column to the stream that will be calculated like this

concat('/folder/',toString(byName('attribute')),'.json')

This works fine

But it my attribute is nested within a json object it does not work... for example

concat('/folder/',toString(byName('parentObject.attribute')),'.json')

If i turn schema drift ,concat('/folder/',parentObject.attribute,'.json') will work fine but this is not what i need since it needs to be dynamic.

On more remark : toString(byName('parentObject') would return the json string of the inner object but how to parse and get only 'attribute' ?

Thanks for the help

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,586 questions
{count} votes

2 answers

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,081 Reputation points
    2020-10-22T00:32:27.83+00:00

    Hello @Victor Brito and welcome to Microsoft Q&A. Thank you for the excellent question. I believe I have an answer for you. Please let me share my steps.

    It sounds like you are passing in a JSON object which includes metadata. To stand in for this, I created a parameter, and gave it the string literal

    '{"ParentObject":{"foo":"bar","attribute":"thing_I_want"},"otherObject":4}'  
    

    While trying to access the JSON as an object is the commonsense approach, since it isn't working, I decided another approach was in order.

    Since JSON is fundamentally just well-formatted text, I figured a regular expression should be able to extract what we need.

    byName(regexExtract($parameter1,`"attribute":"(\w+)"`))  
    

    I tested this by adding a column named "thing_I_want" , and placing the above in a later derived column1.
    34192-image.png
    34183-image.png

    0 comments No comments

  2. Kiran-MSFT 691 Reputation points Microsoft Employee
    2020-10-22T22:43:46.593+00:00

    You should use byItem. Either -

    toString(byItem( byName('parentObject'), 'attribute'))

    or

    byItem( byName('parentObject'), 'attribute') ? string


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.