Share via

Alter Row and Derived Column Not Working Correctly

Bill LaLonde 50 Reputation points
Feb 14, 2023, 2:32 AM

I'm building out a data flow and would like to update rows similar to how we use Update, Set, and Where in SQL using Synapse Data Flow. The documentation video found here: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-alter-row , recommends using Alter Row and Derived Column tasks to accomplish this.

Flow: CSV missing state value -> identify row with establishment number using Alter Row -> Update row using Derived Column -> Allow Update to Sink (Azure SQL DB table).

However, as you can see, I've set the Update If condition to the rows matching my establishment number listed. In the next task, I set the State column (st) to what I want the conditioned rows updated to, MN. But when checking the preview and the output table, you'll see that all rows have had their state column updated to MN, not just the ones I conditioned for.

I watched the demo a bunch of times and can't seem to pinpoint what's happening here. Any ideas?

combo

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,121 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,101 questions
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 38,496 Reputation points Microsoft Employee
    Feb 27, 2023, 9:51 AM

    Hi Bill LaLonde ,

    If your requirement is to perform insert if the data doesn't exist as well as update on already existing data, then I would recommend you go for Upsert option.

    Now, when there is no data in the sink, it will insert the same data as it is from the source. From the next run onwards, in case there is some changes in the source data, it will perform update based on the update if condition you have provided in the alter row condition.

    So, as mentioned correctly by @AnnuKumari-MSFT if there is no data in sink, there is nothing to update on as it is an action to be performed on the sink. However, you can perform transformations on the source data using several transformations present in dataflow. For this case:

    • You can use a filter transformation or conditional split transformation to divide the whole dataset into two parts one having est_number='V18677A' and rest others will fall under default case.
    • Then you can use derived column transformation to modify the St column to 'MN'
    • Afterwards, you can union the two datasets using union transformation.
    • And use Sink transformation to load the data into SQL

    In this way you don't need to insert the source data first and then perform transformation for updating the data in sink because before inserting we are taking care of the transformation.

    dataflowgif

    Hope this helps.


    Please consider hitting Accept Answer and also hit yes for was answer helpful? question. Accepted answers helps community as well. Thank you.

    1 person found this answer helpful.

2 additional answers

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,211 Reputation points
    Feb 14, 2023, 11:34 PM

    @Bill LaLonde Hello and welcome to Microsoft Q&A.

    As I understand, you are using Data Flow alter row, but more than just the intended rows are getting changed.

    Looking closely at your screenshots (which are very helpful), I notice the other rows are marked for insert.

    The derived column affects all rows. In the sink there is a setting for what operations (and which rows by extension) to allow. I suspect you have enabled insert, when only update should be enabled. I base this upon the logic that if a row is marked for insert, and later appears in your sink, then if it is not be updated it must be inserted, so insert must be enabled.

    User's image

    If you need to do both inserts and updates in a single sink operation, then I suggest modifying the derived column to only change the value if row is marked for update. The isUpdate function is useful for this.

    iif (isUpdate(), 'MN',st)
    

  2. AnnuKumari-MSFT 33,986 Reputation points Microsoft Employee
    Feb 15, 2023, 9:04 AM

    Hi Bill LaLonde ,

    Thankyou for the detailed screenshot you have provided. It's really helpful.

    I understand you are trying to update the rows which are matching the condition est_number=='V18677A' , however, you can see no data in the sink table.

    The reason for having no data in the sink table is because you have used the truncate table script in pre-copy script option in sink settings .

    Because of that, the whole table is getting truncated and there's no rows to be updated even if it's matching the condition for at least one row. Kindly reinsert the data and then try same configuration except truncate table script.

    I tried to reproduce your case, I am getting the result as expected.

    alterrow


    Hope it helps. Please do consider clicking Accept Answer and mark it as helpful if it solves your problem.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.