how to split a csv file based on columns.

Question

how to split a csv file based on columns.

Gorakh Nath Sharma 0

Hi Team,

There is a scenario in which we have to split columns of a CSV file into unique files in blob. for example let's say I need to create each file based on the combination of (TimeStart + TimeEnd + A) i.e. three columns per file and filename would be header of column A. second file i need (TimeStart + TimeEnd + B) i.e. three columns per file and filename would be header of column B. so for the below file 9 files need to be created . i want to create this in dynamic fashion because i don't know in which file how many columns are there , there may be 10 columns in one file and may be 6 columns in other file . but Time start and time end columns are fixed.

I can only use ADF for this. so please guide me how i can do this through ADF only.

User's image

1 answer

Your answer

Answer 1

You can use the Lookup activity to read the header row of the CSV file to dynamically identify the columns in the file. (Set the First Row Only option to true)

Then, add a ForEach activity to iterate over the columns identified by the Lookup activity, where inside of it (the Foreach Activity), you add a Data Flow activity that will handle the creation of individual files for each column.

You can set the Items property to @activity('LookupActivityName').output.value

In the Data Flow, you read the CSV file and select the necessary columns (TimeStart, TimeEnd, and the current column from the ForEach activity).

The Select Transformation :

@concat('TimeStart, TimeEnd, ', item().ColumnName)

The Sink Transformation :

@concat(item().ColumnName, '.csv')

Then, you write the data to a new CSV file in Azure Blob Storage, naming the file based on the current column name.

Here is an example of how your ADF pipeline JSON might look:


{

  "name": "SplitCSVPipeline",

  "properties": {

    "activities": [

      {

        "name": "LookupHeaders",

        "type": "Lookup",

        "typeProperties": {

          "source": {

            "type": "DelimitedTextSource",

            "dataset": {

              "referenceName": "YourCSVFileDataset",

              "type": "DatasetReference"

            }

          },

          "firstRowOnly": true

        }

      },

      {

        "name": "ForEachColumn",

        "type": "ForEach",

        "typeProperties": {

          "items": "@activity('LookupHeaders').output.value",

          "activities": [

            {

              "name": "DataFlowActivity",

              "type": "DataFlow",

              "typeProperties": {

                "dataflow": {

                  "referenceName": "SplitCSVDataFlow",

                  "type": "DataFlowReference"

                }

              },

              "dependsOn": []

            }

          ]

        }

      }

    ],

    "variables": []

  }

}

Bhargava-MSFT 31,361 Reputation points Microsoft Employee Moderator

2024-05-23T21:07:38.04+00:00

thank you Amira Bedhiafi

Gorakh Nath Sharma ,

If the above answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.
Bhargava-MSFT 31,361 Reputation points Microsoft Employee Moderator

2024-05-29T22:39:08.49+00:00

Gorakh Nath Sharma ,

I am checking to see if you had a chance to look into the above solution provided by Amira.

Share via

how to split a csv file based on columns.

1 answer

Your answer