Data Factory. Adding Column Name when your files doesn't have column headings

Debbie Edwards 521 Reputation points
2023-12-04T11:13:21.8933333+00:00

I have an issue. My source file is an xlsx and because its not really in the right format I have to do a lot of processing.

The first row has one column populated and the rest are just null.

So I have to set first row as header as unticked

FirstRowAsHeaderUnticked

And I transform the data later into proper tables.

This works fine. However I need the File name adding as a column

AdditionalColumns

in source I add Additional Columnsmapping

And I take the columns and the file name into my SQL Table

But I now get this error

Mixed properties are used to reference 'source' columns/fields in copy activity mapping. Please only choose one of the three properties 'name', 'path' and 'ordinal'. The problematic mapping setting is 'name': 'FileName', 'path': '','ordinal': ''.

This is clearly because I am not using name for 1 2 3 and 4 (Because there are no column names) But I am using FilePath for the last column mapping.

This usually works fine when I have column headings but unfortunately this file has to be as is.

Also I should be sticking to a copy activity for this. can anyone shed any light on how I can get past this?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,133 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Suba Balaji 11,206 Reputation points
    2023-12-04T12:19:10.7566667+00:00

    Hello
    Debbie Edwards
    ,

    One workaround is to declare the filename as 5 in the additional column in source tab, and use it in the mapping tab.

    So, ultimately, you would end up using 1,2,3,4,5 in the mapping. But one catch would be to skip the header row being written to the target table (probably you can use the range field in the source dataset)

    Edit:

    The above workaround doesnt work We should either use a data flow or use two copy activities.

    The above workaround doesnt work We should either use a data flow or use two copy activities.

    1. Convert the excel file to a csv file
    2. Then another copy activity, make that csv file as source and add additional column filename, and load that to table. (supposingly, csv file has the same name as the excel file)

    Steps:

    Pipeline design:

    Screenshot 2023-12-04 at 7.38.04 PM

    • Get metadata: to get all files with .xlsx extension from data lake.
    "childItems": [
    		{
    			"name": "20231124",
    			"type": "Folder"
    		},
    		{
    			"name": "A.xlsx",
    			"type": "File"
    		},
    		{
    			"name": "A1.xlsx",
    			"type": "File"
    		},
    		{
    			"name": "output",
    			"type": "Folder"
    		}
    	],
    
    • Filter is to filter out the folders, and to just retain the file names, as getmetadata brings all .xlsx and folders.
    "Value": [
    		{
    			"name": "A.xlsx",
    			"type": "File"
    		},
    		{
    			"name": "A1.xlsx",
    			"type": "File"
    		}
    	]
    
    • Loop through filtered file names, load one by one using foreach. Foreach setting:
    @activity('Filter1').output.value
    
    • Copy activity 1 source tab: Screenshot 2023-12-04 at 7.50.21 PM

    Source dataset:

    Screenshot 2023-12-04 at 7.51.19 PM

    Sink tab:

    Screenshot 2023-12-04 at 7.52.14 PM

    Sink data set:

    Screenshot 2023-12-04 at 7.54.15 PM

    Copy Mapping tab:

    Screenshot 2023-12-04 at 7.55.08 PM

    Copy activity 2

    Source tab:

    Screenshot 2023-12-04 at 7.55.54 PM

    Sink tab:

    Screenshot 2023-12-04 at 7.56.44 PM

    Note: We need to import the schema in both copy activity source.

    So what the above pipeline does?

    It loops through two xlsx files, A.xlsx and A1.xlsx (TAB NAME: SUBSCRIBE) , where both the file structure are the same. These file data are loaded to a sql table.

    Table data after loading:

    Screenshot 2023-12-04 at 7.59.32 PM

    As you note from here, we can get rid of the .csv from your table column easily! Either by an update or you can also try to avoid the concat() that i have used in the copy activity 1 sink and copy activity 2 source.

    Hope it helps.

    Please let us know if you dont understand any piece of it. Would be happy to asisst.

    Thanks!


  2. Debbie Edwards 521 Reputation points
    2024-01-03T12:28:46.87+00:00

    I cant find the steps that convert the xlsx to a csv? I can't have any manual processes. I was just wondering there that was in the process mentioned above?

    And I am really struggling to understand the get meta data step. this is hard coded but in reality I wont know what the date is.

    	"name": "20231124",
    			"type": "Folder"
    
    0 comments No comments

  3. ShaikMaheer-MSFT 38,326 Reputation points Microsoft Employee
    2024-01-04T05:52:20.9433333+00:00

    Hi Debbie Edwards,

    Thank you for posting query in Microsoft Q&A Platform.

    In above answers, it's not manual conversion of xlsx to csv. Consider using copy activity and then copy your xlsx file as csv file. And then use another copy activity to take csv file and load into destination table with additional column.

    Or try below,

    Since missing headers causing issue, we can consider writing code in azure functions to modify first row of xlsx file to keep some dummy headers and then continue your current implementation process.

    Hope this helps. Please let me know how it goes.


    Please consider hitting Accept Answer button. Accepted answers help community as well. Thank you.

    0 comments No comments