How to check text in row 1 of Excel file for about 10 column headers?
I'm using Azure Data Factory (ADF) on the portal. I use the latest Chrome browser to get to ADF. I'm not an admin and have no admin rights. I have an Excel file I want to read. I'm still fairly new to ADF.
I'm guessing this could be done in a data flow. My plan was to have one pipeline execute many data flows. Each data flow would be different steps in the whole process.
We sometimes get Excel files with the columns in the wrong order, missing columns, added columns we do not want, etc, so I have to check about 10 column names to make sure the column names match what we are expecting. If one column name is incorrect the program should trigger/display an error and stop.
The tutorials I completed did not cover this. A Google search and AI question also did not help.
In my data flow called "dfCheckHeaders" I have these transformations:
- My source transformation: My data source points to a single Excel spreadsheet with 1000s of rows. Allow Schema Drift is checked. The file is .XLSX.
- Next is a SELECT transformation. Under the tab "Select Settings" I only select the columns I need for this.
- Next is a CAST transformation to change one column from a string to a float type.
- Next is a sink. It points to a container called "hdrschecked" in my blob storage.
In my search I did find a mention of using the "row_number()" function but the page wasn't clear if this was an ADF function of SQL function. Since I'm accessing an Excel spreadsheet, using row_number as an SQL function does not make sense.
Can anyone help me or point me to a page or tutorial I can do for this?
Thank you!