Challenges with File Processing and Azure SQL Table Insertion

Brianna C 120 Reputation points
2023-11-12T18:42:41.23+00:00

I am developing a pipeline where I retrieve file names from an SFT using the get metadata activity, filter the necessary files, and then process them into an Azure SQL table. While the Copy activity successfully passed files, I encountered issues when adding two extra columns – processing date and file name. Using Data Flow, I faced difficulty detecting file columns, and mapping in the sink activity only showed derived columns. Even with Copy Activity, I'm uncertain about incorporating derived columns. Can anyone suggest a solution to address these sink/transformation challenges using either the copy activity or data flow activity!?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

Answer accepted by question author
  1. Smaran Thoomu 32,525 Reputation points Microsoft External Staff Moderator
    2023-11-13T12:06:46.2166667+00:00

    Hi @Brianna C ,

    Thank you for reaching out to us with your query. 

    Based on the structure of your pipeline, inside ForEach activity, you are passing every file name to the Dataflow using dataset parameters.

    As you are doing the transformation dynamically for all the files in each iteration, your source schema was set to None while creating the dataset.
    User's image

    When creating a dataset, the schema for a specific file is imported from the connection/store. This imported schema is then used in the dataflow source.

    I faced difficulty detecting file columns, and mapping in the sink activity only showed derived columns.

    In the dataflow debug, you will observe this particular output. However, when you run the dataflow through the pipeline, it will dynamically produce the desired result based on the schema of the source file in each iteration.

    In the below demonstration, I took one file with import schema set to None in the dataset and the source rows in the dataflow debug are 0 now.
    User's image

    User's image

    Now you can see I added one column 'Location' and in sink it is showing only 1 column. 
    But when I execute the dataflow using pipeline, it took the schema dynamically from the file and you can see the result included all the existing columns from the source file.
    User's image

    So, the above behavior won't affect your output file. If you want to cross check the original data while doing the transformation, you need to give your source file once in the dataset and import schema in the dataset. After importing schema, use your dataset parameters in the file name to do this in each iteration.
    User's image

    From this, the dataflow will take it and you can see the data preview while doing the transformations.
    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.