delta merge - multiple conditions

Anthony Carroll 0 Reputation points
2024-12-18T13:23:29.2833333+00:00
import delta
Factevents = delta.DeltaTable.forPath(spark, 'abfss://*******************.dfs.core.windows.net/silver/C365/events/')

# Merge new data
(
    Factevents.alias('events')
    .merge(
        dfNewData.alias('New'),
        "events.uprn = New.uprn AND events.id = New.id"
    )
    .whenMatchedUpdateAll()
    .whenNotMatchedInsertAll()
    .execute()
)


Notebook extraction. As you can see there is multiple join conditions for the merge. Just using id column results in duplicates in target hence the inclusion of uprn to make composite / unique key. Doesn't work. Anyone able to tell me correct syntax please. Spent far too much time on this already. Thanks.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 15,755 Reputation points Microsoft External Staff Moderator
    2024-12-18T17:04:50.1866667+00:00

    @Anthony Carroll

    Thanks for reaching out to Microsoft Q&A.

    Your syntax seems mostly correct, but there might be a few adjustments needed. Here's a refined version of your code:

    import delta
    from delta.tables import *
    # Load the Delta table
    Factevents = DeltaTable.forPath(spark, 'abfss://*******************.dfs.core.windows.net/silver/C365/events/')
    # Merge new data
    (
        Factevents.alias('events')
        .merge(
            dfNewData.alias('New'),
            "events.uprn = New.uprn AND events.id = New.id"
        )
        .whenMatchedUpdateAll()
        .whenNotMatchedInsertAll()
        .execute()
    )
    

    Here are a few steps to check:

    1. Data Types: Ensure that the data types of uprn and id columns in both events and New are the same.
    2. Null Values: Check for any null values in the uprn or id columns that might be causing issues.
    3. Primary Key: Make sure that the combination of uprn and id is unique in both datasets.

    I hope the above steps will resolve the issue, please do let us know if issue persists. Thank you

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.