How to Merging Columns after a join in Azure Synapse Data Flow

Harikrishnan Arayakeel 25 Reputation points
2024-12-05T11:37:39.29+00:00

Subject: Assistance with Merging Columns in Azure Synapse Data Flow

Hi [Recipient's Name],

I’m working with two tables in Azure Synapse data flow. Both tables, table1 and table2, have the same columns: [A], [B], [C], and [D]. I’ve applied a left outer join using [D] as the key, with table1 as the left table.

In the Join output, I’m getting separate columns such as [table1@A], [table2@A], [table1@B], [table2@B], and so on. I’d like to merge the corresponding columns (e.g., [table1@A] and [table2@A]) into a single column [A].

Is there an easier way to achieve this other than creating a derived column with a condition like: if (!isNull([table1@A]), [table1@A], [table2@A])? It is taking alot time to write code for each cloumn to do it this way.

I have tried to use 'Exist' condition, but there is only 'Exist' and 'not exist' condition which is not conveneient.

Looking forward to your suggestions.

Best regards, Harry

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,377 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ganesh Gurram 7,295 Reputation points Microsoft External Staff Moderator
    2024-12-06T01:09:10.6533333+00:00

    Hi @Harikrishnan Arayakeel

    Thanks for the question and using MS Q&A forum.

    Unfortunately, there is no direct built-in "merge columns" function for joined tables in Synapse Data Flow. But there are ways to simplify and optimize the process.

    Below are some options that could make your task easier:

    Option1: Use a Derived Column for Each Pair:

    You are correct that one option is to create a Derived Column transformation where you handle the merging with an expression like this:

    if(isNull([table1@A]), [table2@A], [table1@A])
    

    Use Expressions: For each column pair, you can use the above expression, Repeat the same for columns [B], [C], and [D]. While this still requires some manual work, you can copy and paste the expression for each column, which may save you some time.

    Option2: Use a Data Flow Parameter:

    If you have many columns to merge, consider using a parameterized approach: Create a Parameter:

    Create a parameter in your Data Flow that holds the list of column names you want to merge. In the Derived Column transformation, use an expression that references the parameter. This way, you can dynamically generate the merge logic based on the parameter values. Unfortunately, you still need to write the merge logic for each column, but you can manage the column names through the parameter, making it easier to adjust if needed.

    Option3: Use Mapping Data Flow:

    If you are open to using Mapping Data Flow, you can create a more visual representation of your transformations:

    Join Transformation: Perform your left outer join as you have done. After the join, use a Select transformation to choose the columns you want to keep. Use a Derived Column transformation to create new columns based on the logic you described. If applicable, you can also use an Aggregate transformation to consolidate data further.

    While there is no direct feature to automatically merge columns after a join in Azure Synapse Data Flow, using derived columns with structured expressions or leveraging parameters can help streamline the process. Unfortunately, some manual effort is still required, but organizing your approach can make it more manageable.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click `Accept Answer` and `Yes` for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.