Azure Data Factory - Split Column Range to Rows

Swapneel Hosur 106 Reputation points
2022-11-03T06:23:49.263+00:00

Hi there,

I am using Azure Data Factory Data Flow, I have two columns with Order Id as Primary Key and Order Units in Range as shown below
Order Id, Order Units
1, 1..10
2, 2400..2499
3, 101..500
4, 12000..12999

I need to expand the range as below (Split and Increment)
Order Id, Order Units
1, 1
1, 2
...
1,10

2,2400
2,2401
......
2,2499

I was wondering if you could advise me a method of achieving this, I thought of two options:

  1. Create surrogate key and then row number but the same does not operate at each primary key.
  2. Using iterator outside Dataflow, based on difference of each range iterating the for loop and incrementing value.

Thank you very much.
Swapneel

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,643 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 31,151 Reputation points Microsoft Employee
    2022-11-03T18:00:50.147+00:00

    Hi @Swapneel Hosur ,

    Welcome to Microsoft Q&A platform and thanks for posting your question here.

    As per my understanding, you are trying to split the range of column values into separate row values using mapping dataflow . Please let me know if my understanding is incorrect.

    I tried to think through both the possibilities that you have mentioned. However , none of them seemed a feasible solution . Also, unfortunately, there is no range function available yet in dataflow. So , we have to take a long way of using ADF pipeline activities to get the integers between the range along with calling dataflow for further transformations. Kindly check the below approach:

    1. Use LookUp activity and point the dataset to the source file. Connect it to the ForEach block and use the followng expression @activity('Lookup1').output.value as items.

    256859-01.gif

    2. Use Script activity having dataset pointing to SQL database. Select Non Query option and provide this query to create a table : if object_id('dbo.expandRange') is null create table dbo.expandRange(OrderID int,ObjectUnits varchar(max)); Truncate table dbo.expandRange;

    256904-02.gif

    3. Create a variable var1 of Array datatype . Inside ForEach activity , use Set variable activity and select var1 . Provide this expression : @range(int(split(item().OrderUnits,'..')[0]),add(sub(int(split(item().OrderUnits,'..')[1]),int(split(item().OrderUnits,'..')[0])),1)) in the value.

    256913-03.gif

    4. Use script activity after set variable to insert the data into SQL table using the following query: Insert into dbo.expandRange select @{item().OrderID},'@{variables('var1')}'

    256863-04.gif

    5. Now create a mapping dataflow , use the above SQL table as source in dataflow. Here ObjectUnit is coming as string although values are coming in array format. We need to convert it into array. Use derived column transformation to remove brackets replace(replace(ObjectUnits,'[',''),']','') . Use another Derived column transformation to convert it into array split(ObjectUnits, ','). Use flatten transformation to convert values into individual rows. Refer to the below video:

    256905-05.gif

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button and take satisfaction survey whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful