How to load Excel from ADLS into Azure SQL using ADF while skipping header rows and formatting?

YERNAIDU SIRAPARAPU 125 Reputation points
2025-09-02T09:53:01.7066667+00:00

I am working with Excel files stored in Azure Data Lake Storage (ADLS) that contain inventory data. These files include:

Some extra formatting rows at the top (e.g., company name, date, location).

Headers split across 2 rows as shown in the below screenshot.Uploaded image

I need to load only the structured Inventory Report section into an Azure SQL Database table with the following schema:CREATE TABLE Inventory (

[Material Part Number] NVARCHAR(255) NULL,

[Material Description] NVARCHAR(255) NULL,

[B2B OnHand] DECIMAL(18,4) NULL,

[B2B Available] DECIMAL(18,4) NULL,

[B2B OnOrder] DECIMAL(18,4) NULL,

[B2C OnHand] DECIMAL(18,4) NULL,

[B2C Available] DECIMAL(18,4) NULL,

[B2C OnOrder] DECIMAL(18,4) NULL

);

My challenges are:

How to configure an ADF pipeline to skip the extra non-data rows and handle two-row headers so that only the correct table headers are mapped.

What’s the best way to flatten/transform the headers (e.g., convert “Material + Part Number” into a single column “Material Part Number”).

How to handle data type conversions (Excel strings like "$120" into SQL DECIMAL(18,4)).

Should I rely on Copy Activity with schema mapping, or use a Mapping Data Flow for header cleanup and type conversion?

What is the recommended ADF pipeline design for this type of Excel-to-SQL load?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 41,121 Reputation points Volunteer Moderator
    2025-09-02T12:30:12.4833333+00:00

    Hello !

    Thank you for posting on Microsoft Learn Q&A.

    In your case, I recommend that you use a mapping data flow.

    Start by creating an Excel dataset pointing to ADLS, with parameters:

    • pFolder, pFileName (path to the file)
    • pSheet
    • pHeaderRow (the row number where the second header line begins , the one with part number, description, on hand…)

    In the dataset, set:

    • firstRowAsHeader = true
    • range = @{format('{0}!A{1}:H1048576', dataset().pSheet, dataset().pHeaderRow)}

    You can adjust the end column/row for your sheet since this skips the decorative rows and uses the second header line as the header.

    Why you should start at the second header line? Because the Excel connector can only use one header row. The first line would cause duplicates like on hand twice, starting at the second line gives you stable column positions.

    Then for the pipeline orchestration :

    Get Metadata on the ADLS folder to list .xlsx files.

    • ForEach file to call a mapping data flow activity, pass the dataset parameters (pFolder, pFileName, pSheet, pHeaderRow). If the header row varies per file, keep a small control table (FilePattern, HeaderRow) in SQL and Lookup it before the ForEach.

    In your mapping data flow:

    Source

    • Use the Excel dataset above.
    • You’ll likely get columns like: Part Number, Description, On Hand, Available, On Order, On Hand1, Available1, On Order1 (ADF deduplicates the duplicate names by appending 1.)

    Select / Rename (flatten headers) rename to your final names by position:

    Part Number → Material Part Number

    Description → Material Description

    On Hand → B2B OnHand

    Available → B2B Available

    On Order → B2B OnOrder

    On Hand1 → B2C OnHand

    Available1 → B2C Available

    On Order1 → B2C OnOrder

    If your amounts might be text with symbols you need to create eight derived columns that produce decimal values :

    toDecimal(
      iif(
        isNull(toString(col)) || trim(toString(col)) == '',
        null(),
        replace(
          regexReplace(
            // turn "(120)" into "-120"
            iif(rLike(toString(col), '^\(.*\)$'),
                concat('-', regexReplace(toString(col), '[\(\)]', '')),
                toString(col)
            ),
            '[^0-9\.\-]',     // keep digits, dot, minus; drop currency signs, spaces, commas
            ''
          ),
          ',', ''            // extra safety if commas slip through
        )
      )
    )
    

    Apply that to each of:

    • B2B OnHand, B2B Available, B2B OnOrder,
    • B2C OnHand, B2C Available, B2C OnOrder.

    Then in the sink (Azure SQL Database)

    • Table: dbo.Inventory
    • Mapping: map your renamed columns directly to:
      • [Material Part Number]
      • [Material Description]
      • [B2B OnHand], [B2B Available], [B2B OnOrder]
      • [B2C OnHand], [B2C Available], [B2C OnOrder]

    and don't forget to set table action to allow insert with the option of pre SQL = TRUNCATE TABLE dbo.Inventory if you’re doing a full refresh.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.