Azure Data Factory - Derived column question

Adrian 181 Reputation points
2021-08-24T11:03:46.553+00:00

Hey,

I am trying to utilize Azure Data Factory for a transformation task.

I have a spread sheet with multiple sheets (15) on each of these sheets there is information about a specific product.

This is an example:

126046-image.png

WHat I am after is creating a derived column in an entirely new sheet, in that excel file, that is a list of Product to Category informaton.

In the example above product '1000022' belongs to TTT0000202, TTT0000445 and TTT0000694. So this product will appear in the final list three times, for the three categories.

Any help as to how I should apporach this? There isn't much tutorials on datafactory that I could find, and not for this case anyway.

Thanks!

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,971 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,038 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,966 questions
0 comments No comments
{count} votes

Accepted answer
  1. Adrian 181 Reputation points
    2021-08-26T15:11:05.76+00:00

    Hey, really appreciate you looking into this for me. I also know how to properly use the array() function now, which is very helpful.

    I have actually tried it in the same way but was still getting the same error, what actually ended up fixing it was removing the Space from my new column! I had the column name as 'Category Numbers', as soon as I changed it to 'CategoryNumbers' it has started to work. Not sure if this is a feature that was implemented consciously by Azure but oh well, learning everyday!

    Cheers!

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,466 Reputation points Microsoft Employee
    2021-08-25T11:15:11.85+00:00

    Hi @Adrian ,

    Thank you for posting your query in Microsoft Q&A Platform.

    You need use derived column transformation to generate a new column to hold an array with three categories using array() function and then use flatten transformation to flatten that array to multiple rows.

    Please check below implementation to get better understanding.

    Step1: Source Transformation
    126333-image.png

    Step2: Derived column Transformation, use array() to generate array of all three categories as new column called "categories"
    126373-derivedcol.gif

    Step3: Flatten transformation, to flatten "categories" array in to multiple rows
    126288-flatten.gif

    Step4: Select Transformation, to select only "Product" & "categories" columns
    126305-select.gif

    Once all transformations done you can use Sink transformation to load in to desired target.

    Hope this will help. Thank you.

    ----------------------

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.