Join & merge

Pravalika-randstad 240 Reputation points
2023-10-19T11:26:40.6866667+00:00

have encountered a challenge in Azure Data Factory, where I need to perform a join or merge operation between two datasets that lack a common identity column. This may be an oversight on my part, as it seems like a straightforward task, yet I'm unable to accomplish it using either a join or a union operation.

One of the datasets contains only a few rows with a 'name' column, let's label them as A, and C. On the other hand, the second dataset consists of thousands of rows (1 to N).

My goal is to create combinations for each row in the larger dataset, pairing it with A, B, and C from the smaller dataset, resulting in the following pattern: 1A 1B 1C 2A 2B 2C…

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
{count} votes

Accepted answer
  1. Smaran Thoomu 12,620 Reputation points Microsoft Vendor
    2023-10-20T11:31:55.85+00:00

    Hi @Pravalika-randstad
    Welcome to Microsoft Q&A platform and thanks for posting your question here.
    As I understand, you want to generate combinations for each row in the larger dataset, pairing it with A, B, and C from the smaller dataset, resulting in the following pattern: 1A 1B 1C 2A 2B 2C, and so on. If this is not correct, please let me know.
    According to your requirements, you can consider the following suggestion.

    • You can use **Custom (cross) **join type in the Join to get the result in this case.
      I have created sample Large Dataset(Numbers) with numbers up to 10 and Small Dataset(Letters) with A,B,C.
    • Now, use Join with Large dataset as left and small dataset as right and use custom join with the condition as true().
    • In the Optimize of Join, select off at the Broadcast to get the above format of the data. You can see the merge of two datasets below.
      User's image
    • If you would like to have the result in a single column with values like 1A, 1B, 1C, and so on, you can achieve this by first using a derived column to concatenate using concat(toString(Numbers), Alphabet) for the values mentioned above, and then select the desired column. User's image
    • Now, use the select operation to choose any column. However, in the GIF below, I am demonstrating the entire process to show you how we can achieve your requirement.

    merge12

    Alternate approach:
    To achieve the same thing would be to use surrogate key transformations in both the sources which would add an additional column in both the dataset (say Id1, Id2) and then use derived column transformation with expression as : iif(Id1%3==0,3,toInteger(Id1%3)) to split the rows into required number of records (ex: 3) and then use join transformation.

    Here is an example: https://youtu.be/coWWmyKHCMs?list=PLsJW07-_K61IFEbcrImrHeLaS6C_voe0E&t=729

    I hope this helps! Let me know if you have any further questions.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful