Additional column that records the row number automatically in a copy activity in Azure Data Factory

Tinu Oommen 21 Reputation points
2021-06-02T12:13:52.777+00:00

I have a Copy Activity in my ADF Pipeline which copies an excel worksheet data to a JSON sink. I am looking for a way to add a custom column that can keep a track of the row number while copying the data. For example,

Name Value
ABC 123
DEF 456
GHI 789
The above should be converted into the below JSON:

[{
"Name": "ABC",
"Value": "123",
"Row Number": 1
},
{
"Name": "DEF",
"Value": "456",
"Row Number": 2
},
{
"Name": "GHI",
"Value": "789",
"Row Number": 3
}]

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,533 questions
0 comments No comments
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,422 Reputation points Microsoft Employee
    2021-06-03T06:44:43.193+00:00

    Hi @Tinu Oommen ,

    Welcome to Microsoft Q&A forum and thanks for your query.

    Unfortunately Copy activity is intended to copy data among data stores located on-premises and in the cloud. After you copy the data, you can use other activities to further transform and analyze data. For your requirement Mapping data flow activity is better suited. Please see below steps to achieve your requirement using Mapping data flow.

    1. Below is how source looks like: 101957-image.png
    2. Then add a Derived Column transformation with the ConstantValueColumn ConstantValue_1 and value set to 1 or any other constant value of your preference. 102003-image.png Your Data preview looks like below
      101993-image.png
    3. Then add a Window transformation with the rowNumber() function and ConstantValue_1as a sorting column.
      101923-image.png 101920-image.png Data preview looks like below:
      101936-image.png
    4. Then add a sink transofrmation and do map only the required columns 101976-image.png Data Preview:
      101982-image.png

    Hope this helps. Do let us know if you have further query.

    ----------

    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

2 additional answers

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,422 Reputation points Microsoft Employee
    2021-06-10T15:02:29.923+00:00

    Hi @Tinu Oommen

    Apologies for the delayed response.

    I did tried multiple ways and finally was able to come up with a workaround to get the desired output. You will need additional 3 transformations followed by Surrogate Key transformation to make the JSON hierarchy (array document). Please see below image.

    104270-image.png

    Here is the GIF of implementation:

    104323-makejsonhierarchy.gif

    Please note that, in the sink transformation, I have used a delimited dataset but in the sink file name options I have provided a file name with .json extension (EXPRESSION USED : concat('CSV_Output_',toString(currentDate('PST'), 'yyyyMMdd'),'.json')) so that your output will save a JSON, also set the partition to single partition.

    104258-image.png

    And also please note after creating the delimited dataset, you will have to click on Edit for both Escape character and Quote character and enter space, so that output will not have backslash and double quote characters.

    104305-image.png

    Hope this helps. Do let us know if you have further query.


  2. Federico 0 Reputation points
    2023-08-18T08:35:46.1133333+00:00

    Hi.

    What about the surrogate key transformation? That should solve it without messy workarounds.

    https://learn.microsoft.com/en-us/azure/data-factory/data-flow-surrogate-key

    0 comments No comments