Delete all records from a dim table which are not in the fact table using primary key

jennifer zen 341 Reputation points
2021-01-04T17:25:46.363+00:00

Hello experts,

I have a process where I populate these two tables-[dbo].[DIM_AGENTS] and [dbo].FACT_COMMISSION

The process first deletes the data from [dbo].[DIM_AGENTS] for the current year based on the below query

DELETE A FROM [dbo].[DIM_AGENTS] A
JOIN [dbo].[FACT_COMMISSION] F ON A.AGENT_KEY = F.AGENT_KEY
JOIN dbo.Dim_Calendar B ON F.DateKey = B.DateKey
WHERE B.CalendarYearMonth > = 202001

But my problem is whenever I want to refresh the data for previous year (WHERE B.CalendarYearMonth > = 201601)..I end up with this error-->Violation of PRIMARY KEY constraint 'PK_DIM_AGENT'. Cannot insert duplicate key in object 'dbo.DIM_AGENT'. The duplicate key value is (165863).

So I assume there are some records which has same key and it is causing the error.

My question is I want to delete all the records in the dbo.DIM_AGENT table which are not in the [dbo].[FACT_COMMISSION] table first and then delete all the records for which ever year I want...

I am thinking to do a left join and delete the records where F.AGENT_KEY IS NULL .But I need the other part of deleting records for the year mentioned as well...

Any suggestions please?

Thanks in advance :)
Jenni53353-screenshot-2021-01-04-172053.jpg

53288-screenshot-2021-01-04-172221.jpg53297-screenshot-2021-01-04-182412.jpg

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
13,948 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,651 questions
0 comments No comments
{count} votes

Accepted answer
  1. MelissaMa-MSFT 24,201 Reputation points
    2021-01-05T07:17:15.147+00:00

    Hi @jennifer zen ,

    It could be better for you to post some sample data and expected output so that we could proceed with testing.

    My question is I want to delete all the records in the dbo.DIM_AGENT table which are not in the [dbo].[FACT_COMMISSION] table first and then delete all the records for which ever year I want...

    If you would like to delete all the records in dbo.DIM_AGENT table which are not in the [dbo].[FACT_COMMISSION] table, below query could be work:

    DELETE A  
    FROM [dbo].[DIM_AGENTS] A  
    LEFT JOIN [dbo].[FACT_COMMISSION] F ON A.AGENT_KEY = F.AGENT_KEY  
    WHERE F.AGENT_KEY IS NULL  
    

    As mentioned by Erland, you could use NOT EXISTS . But a small modification on his query as below:

    DELETE A  
    FROM DIM_AGENTS A  
    WHERE NOT EXISTS (SELECT *  
        FROM FACT_COMMISSION F  
        JOIN Dim_Calendar C ON F.DateKey = C.DateKey  
       WHERE A.AGENT_KEY = F.AGENT_KEY      
           AND C.CalendarYearMonth  >= 201601)  
    

    Or using NOT IN:

    DELETE A  
    FROM DIM_AGENTS A  
    WHERE AGENT_KEY NOT IN (SELECT AGENT_KEY  
        FROM FACT_COMMISSION F  
        JOIN Dim_Calendar C ON F.DateKey = C.DateKey  
       WHERE A.AGENT_KEY = F.AGENT_KEY      
           AND C.CalendarYearMonth  >= 201601)  
    

    Best regards
    Melissa


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

4 additional answers

Sort by: Most helpful
  1. Tom Phillips 17,736 Reputation points
    2021-01-04T18:12:26.807+00:00

    First, the code you posted does not match your screen shots of the table structures. You have F.datekey = B.datekey. F does not have a datekey.

    Second, your error is apparently on duplicate AGENT_KEY. How did you determine AGENT_KEY? Is it an identity field? You likely have multiple dates pointed to the same AGENT_KEY, so you will never delete the AGENT_KEY in question. You should be only INSERTing missing AGENT_KEYs.


  2. Erland Sommarskog 112.6K Reputation points MVP
    2021-01-04T22:25:19.79+00:00

    It seems funny to me that FACT_COMMISSION has AGENT_KEY as its primary key. I would kind of expect there could be many rows for the same agent in the fact table.

    Anyway, you delete all agents that has a date key in 2020 or later, but then you refresh from 2016? Can't this lead to that you get duplicates for agents active in 2019 or earlier?

    Did you check the data for the agent in the error message, 165863?

    0 comments No comments

  3. jennifer zen 341 Reputation points
    2021-01-04T22:49:11.37+00:00

    Hello @Erland Sommarskog ,

    Thanks for you reply...

    No.. we usually only delete data and re populate for current year say 2020 every two month.. won't get this error
    But every year we do a one time refresh of the data for past 4 years...only then this issue occurs..(date will changed through a parameter in the package)

    (This is business requirements...)

    since its new year (2021),now has been requested to refresh the data from 2016 and this error shows up!

    Since the process is automated(through packages) it ends up as 'failed' until we manually find the duplicate keys and delete them from the table

    So I just need some help as how i can automate the deletion of the duplicate rows through the script.
    Like if I have to refresh the table again say next year, if there are duplicate keys ,i don't want to do it manually...let the delete script handle that...I will just pass the parameter for the date...if i change the delete script to as below... left join with is null... will it work?

    DELETE A FROM [dbo].[DIM_AGENTS] A
    LEFT JOIN [dbo].[FACT_COMMISSION] F ON A.AGENT_KEY = F.AGENT_KEY
    WHERE A.AGENT_KEY IS NULL

    0 comments No comments

  4. Erland Sommarskog 112.6K Reputation points MVP
    2021-01-04T23:02:23.043+00:00

    If it will work? I don't have your tables, so I can't test...

    But the originally query seemed to say:

    DELETE FROM DIM_AGENTS
    FROM DIM_AGENTS A
    WHERE EXISTS (SELECT *
        FROM FACT_COMMISSION F
        JOIN CALENDAR C ON F.DateKey = C.DateKey
       WHERE A.AGENT_KEY = F.AGENT_EKY    
           AND C.CalendarYearMonth  >= 202001)
    

    I would kind of expect that if you wanted to refresh from 2016, you would change the above to 201601, but maybe there is something I don't understand, because I think the query you post now is more like:

    DELETE FROM DIM_AGENTS
    FROM DIM_AGENTS A
    WHERE NOT EXISTS (SELECT *
        FROM FACT_COMMISSION F
        JOIN CALENDAR C ON F.DateKey = C.DateKey
       WHERE A.AGENT_KEY = F.AGENT_EKY    
           AND C.CalendarYearMonth  >= 201601)
    

    That is only delete agents that are not in the data from 2016 and on?

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.