What is purpose of parameter, its default value & @item().name on dataset?

Justin Doh 920 Reputation points
2023-08-03T23:17:20.6733333+00:00

Hi.

I am trying to replicate pipeline from this video.

User's image

All About BI!

https://youtu.be/bkJwgEzos9k?t=61

But, I am still stuck after I replicated from this video.

What I am trying to do is:

  1. Check structure of several csv files inside one folder.
  2. Then, compare structure of one csv file (reference file).

First, I understand these:

  1. I need to somehow pass child item info from source files.
  2. Then, I need to pass using this expression: @activity('Get Metadata1').output.childItems

What I do not understand are followings:

  1. Why do we need parameter on the second dataset? (in this case 'filename') and what I should I put as a Default value inside the box?

User's image

  1. What is purpose of "@item().name"?

User's image

I am trying to understand the logic of parameter on the second dataset and also why we need @item().name.

Thanks.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

Answer accepted by question author
  1. Nandan Hegde 36,716 Reputation points MVP Volunteer Moderator
    2023-08-04T16:50:32.5933333+00:00

    Hey,
    The All with BI aspect would be the logic that you would incorporate within the foreach activity.

    So lets take an example of your case:

    within your blob path lets say you have 4 files within the blob path :
    Path : raw/folder1/
    f1.csv
    f2.csv
    f3.csv
    f4.csv

    via your initial getmetadata activity leveraging the dataset mapping to the path raw/folder1/ and the childitem feature, you would get the list of file names as the get metadata activity1 output :

    Getmetadata activity chile item list output :
    f1.csv
    f2.csv
    f3.csv
    f4.csv

    Now you would provide this array as an input to the for activity, so the loop would run 4 times corresponding to the number of files.

    Within for each activity :-
    1st iteration :
    item().name value would be f1.csv

    now the getmeta data activity 2 within for activity would leverage the dataset wherein you need to pass the file name : item().name
    So in 1st iteration: f1.csv

    So the dataset would map to the overall aspect as :
    raw/folder1/f1.csv thereby pointing to a file and hence providing the structure feature of get meta data activity.

    lets assume the structure for the file f1.csv is

    c1,c2,c3

    Note: this is just a sample one

    You can compare this header with your sample reference.

    And based on comparision, the next actions would be taken.

    Once the 1st iteration is done, for activity would move to 2nd iteration and in 2nd iteration the item().name would be f2.csv and it would follow the same process as f1.csv.

    So the youtube video part would be the logic within the for activity.

    And as per your defaut value, you can add any value as in case during run time, it can take the latest value in case if any value is passed else it would take the default value

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,631 Reputation points Microsoft Employee Moderator
    2023-08-04T07:05:38.36+00:00

    Hi King Java,

    Thank you for posting query in Microsoft Q&A Platform.

    Q. Why do we need parameter on the second dataset? (in this case 'filename') and what I should I put as a Default value inside the box? and What is purpose of "@item().name"?

    A. Your second dataset is inside ForEach Acitivity, ForEach activity takings childitems that means file names from folder and iterating over them. If you observe output json of first GetMetaData Activity it has file names as json object keys name and type. So inside ForEach Activity to access file name we should write expression as @item().name. This file name we should pass in to second dataset, so that it can point to files dynamically while fetching struture. Hence we parameterize second dataset there. We no need to pass any default value to second dataset parameter.

    Please check below video that helps to understand about parameterization and also how pass output of one activity to other.

    Parameterize Datasets in Azure Data Factory

    How to read JSON output of one Activity in to another Activity in Azure Data Factory

    Hope this helps. Please let me know if any further queries.


    Please consider hitting Accept Answer button. Accepted answers help community as well.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.