How to split json keys and values separately without data flow in azure data factory

Uday Juttu 1 Reputation point
2022-08-10T03:01:58.5+00:00

I have HubSpot dataset and I am doing a lookup activity which gives first row from dataset

Firstrow : {

'col1' : 'value1',
'col2' : ' value2',

}

I want to get all columns from output JSON (I,e col1, col2) so that I can do schema validation for table. I am wondering how get dynamic column information

Note : I cannot use Data flow & get metadata activity HubSpot datasets cannot support this.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
Community Center Not monitored
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,236 Reputation points
    2022-08-10T16:51:00.513+00:00

    Hello @Uday Juttu ,
    Thanks for the question and using MS Q&A platform.

    So you want to check all the column names for schema validation without using dataflow.

    Much to my frustration, there is no function for returning all the keys/properties of a JSON. If there was, I could do so many things.

    However there is another way. Instead of getting a list of all keys from the lookup activity, how about providing a list of all expected column names as a parameter. Then we can ForEach that list of expected names and try

    @contains( activity('lookup').output.firstrow , item() )  
    

    This would verify each item you provide does exist, but it doesn't verify that there are no extra columns. For that we need another test. Since json key-value pairs always have a ; between them, we can count them by splitting them and subtracting 1 from the length of the resulting array.

    @sub(  
      length(  
        split(  
          string(activity('lookup').output.firstRow)  
        , ':')  
      )  
    ,1)  
    

    Of course we want to compare to the length of your list of expected columns, so wrap that in

    equals( length( pipeline.parameters.list_to_compare , count_of_lookup_columns )  
    

    Please do let me if you have any queries.

    Thanks
    Martin


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.