Use Json list as merge condition in synapse data flow

Kumar, Arun 341 Reputation points
2023-12-21T04:34:43.6866667+00:00

I am using a dataflow to do upsert on target delta table (source is also delta table). My source data have a json key value pair in one of the columns (transaction) and need to use that as Key Column. The data in transaction column is like below

{"ce":0.29,"cec":0.0002964954,"ceum":{"base_unit_id":"k2k0","from_base":"0.001","id":"k2l0","to_base":"1000","unit_abbreviation":"t","unit_class_id":"oonrg","unit_name":"metric tonne"},


In the $mergeCondition currently i use "id" as the key column. Since transaction column is a key:value pair as above, how can i use this as key?

User's image

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

Answer accepted by question author
  1. Bhargava-MSFT 31,361 Reputation points Microsoft Employee Moderator
    2023-12-29T21:05:47.53+00:00

    Hello Kumar, Arun,

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others", I'll repost your solution in case you'd like to "Accept" the answer.

    Issue:How to use Json list as merge condition in synapse data flow

    Using a dataflow to do upsert on target delta table (source is also delta table). Source data have a json key value pair in one of the columns (transaction) and need to use that as Key Column. The data in transaction column is like below

    {"ce":0.29,"cec":0.0002964954,"ceum":{"base_unit_id":"k2k0","from_base":"0.001","id":"k2l0","to_base":"1000","unit_abbreviation":"t","unit_class_id":"oonrg","unit_name":"metric tonne"}

    How to use that transaction column which is a list/array as a Key Column in data flow.

    Solution:

    One solution that resolved the issue was flattening the transaction column through PySpark and then performing a Delta merge using the notebook.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    I hope this helps!

    Please remember to "Accept Answer" if any answer/reply helped, so that others in the community facing similar issues can easily find the solution.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.