Unable to flatten a complex json object using mapping data flow flatten activity

Osamwonyi Alekwe 40 Reputation points
2023-04-22T12:21:21.8633333+00:00

I have a complex json file in my blob storage that am trying to flatten, it seems to grey out some of the attributes, hence am unable to unroll with those attribute, see initial json structure below: User's image

expanded structure is shown: User's image

Am unable to unroll with the agent (complex object) as its graded out: as shown below User's image

please can i get any assistance to resolve this, thank you

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,991 questions
{count} votes

2 answers

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,597 Reputation points Microsoft Employee
    2023-04-28T18:32:18.6133333+00:00

    Hi @Osamwonyi Alekwe

    Thanks for your response and appreciate for sharing the sample data.

    User's image

    I see that you have 4 array objects which will have to flatten. In order to flatten them you have to have 4 subsequent flatten transformations as shown below to flatten them at each level. At the end you will see the data flattened as table structure.

    User's image

    User's image

    User's image

    User's image

    And for each flatten transformation you can verify the data by clicking on data preview as below:

    User's image

    Here is the dataflow script I tested for your reference (please note I might have missed few columns but you can add them as per your requirment):

    source(output(
    		entities as (agent as (id as string, name as string, division as (id as string, name as string, selfUri as string), chat as (jabberId as string), email as string, primaryContactInfo as (address as string, mediaType as string, type as string)[], addresses as string[], state as string, username as string, manager as (id as string, selfUri as string), version as integer, acdAutoAnswer as boolean, selfUri as string), numEvaluations as integer, averageEvaluationScore as integer, numCriticalEvaluations as integer, highestEvaluationScore as double, lowestEvaluationScore as double, agentEvaluatorActivityList as (agent as (id as string, name as string, division as (id as string, name as string, selfUri as string), chat as (jabberId as string), email as string, primaryContactInfo as (address as string, mediaType as string, type as string)[], addresses as string[], state as string, username as string, manager as (id as string, selfUri as string), version as integer, acdAutoAnswer as boolean, selfUri as string), evaluator as (id as string, name as string, division as (id as string, name as string, selfUri as string), chat as (jabberId as string), department as string, email as string, primaryContactInfo as (address as string, mediaType as string, type as string)[], addresses as string[], state as string, title as string, username as string, manager as (id as string, selfUri as string), version as integer, acdAutoAnswer as boolean, selfUri as string), numEvaluations as integer, averageEvaluationScore as integer, numEvaluationsWithoutViewPermission as integer)[], numEvaluationsWithoutViewPermission as integer)[],
    		pageSize as integer,
    		pageNumber as integer,
    		total as integer,
    		nextUri as string,
    		lastUri as string,
    		firstUri as string,
    		selfUri as string,
    		pageCount as integer
    	),
    	allowSchemaDrift: true,
    	validateSchema: false,
    	ignoreNoFilesFound: false,
    	documentForm: 'documentPerLine') ~> source1
    source1 foldDown(unroll(entities, entities),
    	mapColumn(
    		entity_agent_id = entities.agent.id,
    		entity_agent_name = entities.agent.name,
    		entities_agent_division_id = entities.agent.division.id,
    		entities_agent_division_name = entities.agent.division.name,
    		entities_agent_division_selfUri = entities.agent.division.selfUri,
    		entities_agent_chat_jabberId = entities.agent.chat.jabberId,
    		entities_agent_email = entities.agent.email,
    		entities_agent_state = entities.agent.state,
    		entities_agent_username = entities.agent.username,
    		entities_agent_manager_id = entities.agent.manager.id,
    		entities_agent_manager_selfUri = entities.agent.manager.selfUri,
    		entities_agent_version = entities.agent.version,
    		entities_agent_acdAutoAnswer = entities.agent.acdAutoAnswer,
    		entity_numEvaluations = entities.numEvaluations,
    		entity_averageEvaluationScore = entities.averageEvaluationScore,
    		entity_numCriticalEvaluations = entities.numCriticalEvaluations,
    		entity_highestEvaluationScore = entities.highestEvaluationScore,
    		entity_lowestEvaluationScore = entities.lowestEvaluationScore,
    		entity_numEvaluationsWithoutViewPermission = entities.numEvaluationsWithoutViewPermission,
    		entity_agent_addresses = entities.agent.addresses,
    		primaryContactInfo = entities.agent.primaryContactInfo,
    		entity_agentEvaluatorActivityList = entities.agentEvaluatorActivityList
    	),
    	skipDuplicateMapInputs: false,
    	skipDuplicateMapOutputs: false) ~> flattenEntities
    flattenEntities foldDown(unroll(primaryContactInfo, primaryContactInfo),
    	mapColumn(
    		entity_agent_id,
    		entity_agent_name,
    		entities_agent_division_id,
    		entities_agent_division_name,
    		entities_agent_division_selfUri,
    		entities_agent_chat_jabberId,
    		entities_agent_email,
    		entities_agent_state,
    		entities_agent_username,
    		entities_agent_manager_id,
    		entities_agent_manager_selfUri,
    		entities_agent_version,
    		entities_agent_acdAutoAnswer,
    		entity_numEvaluations,
    		entity_averageEvaluationScore,
    		entity_numCriticalEvaluations,
    		entity_highestEvaluationScore,
    		entity_lowestEvaluationScore,
    		entity_numEvaluationsWithoutViewPermission,
    		entity_agent_primaryContactInfo_address = primaryContactInfo.address,
    		entity_agent_primaryContactInfo_mediaType = primaryContactInfo.mediaType,
    		entity_agent_primaryContactInfo_type = primaryContactInfo.type,
    		entity_agent_addresses,
    		entity_agentEvaluatorActivityList = entity_agentEvaluatorActivityList
    	),
    	skipDuplicateMapInputs: false,
    	skipDuplicateMapOutputs: false) ~> flattenAgentPrimaryContactInfo
    flattenAgentPrimaryContactInfo foldDown(unroll(entity_agentEvaluatorActivityList, entity_agentEvaluatorActivityList),
    	mapColumn(
    		entity_agent_id,
    		entity_agent_name,
    		entities_agent_division_id,
    		entities_agent_division_name,
    		entities_agent_division_selfUri,
    		entities_agent_chat_jabberId,
    		entities_agent_email,
    		entities_agent_state,
    		entities_agent_username,
    		entities_agent_manager_id,
    		entities_agent_manager_selfUri,
    		entities_agent_version,
    		entities_agent_acdAutoAnswer,
    		entity_numEvaluations,
    		entity_averageEvaluationScore,
    		entity_numCriticalEvaluations,
    		entity_highestEvaluationScore,
    		entity_lowestEvaluationScore,
    		entity_numEvaluationsWithoutViewPermission,
    		entity_agent_primaryContactInfo_address,
    		entity_agent_primaryContactInfo_mediaType,
    		entity_agent_primaryContactInfo_type,
    		entity_agentEvaluatorActivityList_id = entity_agentEvaluatorActivityList.agent.id,
    		entity_agentEvaluatorActivityList_name = entity_agentEvaluatorActivityList.agent.name,
    		entity_agentEvaluatorActivityList_division_id = entity_agentEvaluatorActivityList.agent.division.id,
    		entity_agentEvaluatorActivityList_division_name = entity_agentEvaluatorActivityList.agent.division.name,
    		entity_agentEvaluatorActivityList_division_selfUri = entity_agentEvaluatorActivityList.agent.division.selfUri,
    		entity_agentEvaluatorActivityList_chat_jabberId = entity_agentEvaluatorActivityList.agent.chat.jabberId,
    		entity_agentEvaluatorActivityList_email = entity_agentEvaluatorActivityList.agent.email,
    		entity_agentEvaluatorActivityList_primaryContactInfo = entity_agentEvaluatorActivityList.agent.primaryContactInfo,
    		entity_agentEvaluatorActivityList_addresses = entity_agentEvaluatorActivityList.agent.addresses,
    		entity_agentEvaluatorActivityList_state = entity_agentEvaluatorActivityList.agent.state,
    		entity_agentEvaluatorActivityList_username = entity_agentEvaluatorActivityList.agent.username,
    		entity_agentEvaluatorActivityList_manager_id = entity_agentEvaluatorActivityList.agent.manager.id,
    		entity_agentEvaluatorActivityList_manager_selfUri = entity_agentEvaluatorActivityList.evaluator.manager.selfUri,
    		entity_agentEvaluatorActivityList_version = entity_agentEvaluatorActivityList.agent.version,
    		entity_agentEvaluatorActivityList_acdAutoAnswer = entity_agentEvaluatorActivityList.agent.acdAutoAnswer,
    		entity_agentEvaluatorActivityList_selfUri = entity_agentEvaluatorActivityList.agent.selfUri,
    		entity_agentEvaluatorActivityList_evaluator_id = entity_agentEvaluatorActivityList.evaluator.id,
    		entity_agentEvaluatorActivityList_evaluator_name = entity_agentEvaluatorActivityList.evaluator.name,
    		entity_agentEvaluatorActivityList_evaluator_division_id = entity_agentEvaluatorActivityList.evaluator.division.id,
    		entity_agentEvaluatorActivityList_evaluator_division_name = entity_agentEvaluatorActivityList.evaluator.division.name,
    		entity_agentEvaluatorActivityList_evaluator_division_selfUri = entity_agentEvaluatorActivityList.evaluator.division.selfUri
    	),
    	skipDuplicateMapInputs: false,
    	skipDuplicateMapOutputs: false) ~> flattenAgentEvaluatorActivityList
    flattenAgentEvaluatorActivityList foldDown(unroll(entity_agentEvaluatorActivityList_primaryContactInfo, entity_agentEvaluatorActivityList_primaryContactInfo),
    	mapColumn(
    		entity_agent_id,
    		entity_agent_name,
    		entities_agent_division_id,
    		entities_agent_division_name,
    		entities_agent_division_selfUri,
    		entities_agent_chat_jabberId,
    		entities_agent_email,
    		entities_agent_state,
    		entities_agent_username,
    		entities_agent_manager_id,
    		entities_agent_manager_selfUri,
    		entities_agent_version,
    		entities_agent_acdAutoAnswer,
    		entity_numEvaluations,
    		entity_averageEvaluationScore,
    		entity_numCriticalEvaluations,
    		entity_highestEvaluationScore,
    		entity_lowestEvaluationScore,
    		entity_numEvaluationsWithoutViewPermission,
    		entity_agent_primaryContactInfo_address,
    		entity_agent_primaryContactInfo_mediaType,
    		entity_agent_primaryContactInfo_type,
    		entity_agentEvaluatorActivityList_id,
    		entity_agentEvaluatorActivityList_name,
    		entity_agentEvaluatorActivityList_division_id,
    		entity_agentEvaluatorActivityList_division_name,
    		entity_agentEvaluatorActivityList_division_selfUri,
    		entity_agentEvaluatorActivityList_chat_jabberId,
    		entity_agentEvaluatorActivityList_email,
    		entity_agentEvaluatorActivityList_state,
    		entity_agentEvaluatorActivityList_username,
    		entity_agentEvaluatorActivityList_manager_id,
    		entity_agentEvaluatorActivityList_manager_selfUri,
    		entity_agentEvaluatorActivityList_version,
    		entity_agentEvaluatorActivityList_acdAutoAnswer,
    		entity_agentEvaluatorActivityList_selfUri,
    		entity_agentEvaluatorActivityList_evaluator_id,
    		entity_agentEvaluatorActivityList_evaluator_name,
    		entity_agentEvaluatorActivityList_evaluator_division_id,
    		entity_agentEvaluatorActivityList_evaluator_division_name,
    		entity_agentEvaluatorActivityList_evaluator_division_selfUri,
    		entity_agentEvaluatorActivityList_evaluator_primaryContactInfo_address = entity_agentEvaluatorActivityList_primaryContactInfo.address,
    		entity_agentEvaluatorActivityList_evaluator_primaryContactInfo_mediaType = entity_agentEvaluatorActivityList_primaryContactInfo.mediaType,
    		entity_agentEvaluatorActivityList_evaluator_primaryContactInfo_type = entity_agentEvaluatorActivityList_primaryContactInfo.type
    	),
    	skipDuplicateMapInputs: false,
    	skipDuplicateMapOutputs: false) ~> flattenEvaluatorActivityListPrimaryContactInfo
    flattenEvaluatorActivityListPrimaryContactInfo sink(allowSchemaDrift: true,
    	validateSchema: false,
    	input(
    		author as string,
    		title as string,
    		genre as string,
    		price as string,
    		pub_date as string,
    		review as string,
    		{@id} as string,
    		fileName as string
    	),
    	partitionFileNames:['FlattenedComplextJson.csv'],
    	umask: 0022,
    	preCommands: [],
    	postCommands: [],
    	skipDuplicateMapInputs: true,
    	skipDuplicateMapOutputs: true,
    	partitionBy('hash', 1)) ~> sinkFlattened
    

    Hope this info helps.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.


  2. KranthiPakala-MSFT 46,597 Reputation points Microsoft Employee
    2023-05-12T16:03:41.89+00:00

    @Osamwonyi Alekwe Thanks for the follow-up. As per my initial analysis, it seems like data inconsistency, which is resulting in this behavior.

    Here is the observation from analyzing couple of records. agentEvaluatorActivityList is complex array object. But as per my observation for few of the records it is empty, and no value is passed from source. Hence, as the source data is not uniform/consistent this is resulting in missing records in the final output.

    User's image

    Will it be possible for you to check with the team who provide you the source data to ensure that the data is consistent or uniform for all the records? At least if the source data have a default value when there is no value for them, it would avoid this issue.

    In case if your source data in not uniform only for agentEvaluatorActivityList then what you can do is you can have a conditional split transformation to check the size of agentEvaluatorActivityListand have two different streams. After further flattening of agentEvaluatorActivityList then you can do a Union of the two streams that way your final target will have all the records. Below is a sample.

    User's image

    From the sample file you have provided, I see 20 records have the agentEvaluatorActivity data and 5 records' doesn't have the agentEvaluedActivity data.

    User's image

    User's image

    Before your final sink, do a Union of both the flattened data so that you will have complete records.

    User's image

    User's image

    Once you do the union you will see the records that doesn't have agentEvaluedActivity with empty values as below:
    User's image

    And if you have multiple non-uniformed/inconsistent array objects, then the ideal way is to fix the source data rather than trying to fix it in ADF data flow.

    Hope this info helps.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.