Improve performance in json parsing using data flow

John 190 Reputation points
2024-01-16T23:08:58.24+00:00

Hei, I am parsing a json payload using data flow where it has verious nodes(array object nested) for that reason, I have to flatten it multiple time. and dataflow looks like below:- Mapping data flow.JPG For one article it takes about 3 minutes. Is there a way to improve the performance. Thanks.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,235 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Kyle1245 420 Reputation points
    2024-01-18T18:04:06.0933333+00:00

    You can consider using the built-in capabilities of Azure Data Factory Mapping Data Flow for JSON parsing.

    {
        "name": "CustomJSONParsingDataFlow",
        "type": "MappingDataFlow",
        "activities": [
            {
                "name": "Source",
                "type": "Source",
                "output": {
                    "name": "SourceOutput"
                }
            },
            {
                "name": "Flatten1",
                "type": "Flatten",
                "linkedService": {
                    "referenceName": "YourLinkedService1",
                    "type": "LinkedServiceReference"
                },
                "inputs": [
                    {
                        "referenceName": "SourceOutput"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "FlattenOutput1"
                    }
                ]
            },
            {
                "name": "Flatten2",
                "type": "Flatten",
                "linkedService": {
                    "referenceName": "YourLinkedService2",
                    "type": "LinkedServiceReference"
                },
                "inputs": [
                    {
                        "referenceName": "FlattenOutput1"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "FlattenOutput2"
                    }
                ]
            },
            {
                "name": "Sink",
                "type": "Sink",
                "inputs": [
                    {
                        "referenceName": "FlattenOutput2"
                    }
                ],
                "linkedService": {
                    "referenceName": "YourDestinationLinkedService",
                    "type": "LinkedServiceReference"
                }
            }
        ]
    }
    
    

    In this example, I've used two Flatten activities to handle multiple levels of nested arrays. Replace placeholders like "YourLinkedService1", "YourLinkedService2", and "YourDestinationLinkedService" with your actual linked services. Adjust the number of Flatten activities based on your JSON structure. Each Flatten activity can be configured to handle a specific level of nesting in your JSON data. This approach utilizes the native capabilities of Data Flow in handling nested structures without custom scripts, which can sometimes improve performance.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.