How to apply SCD2 on a view or a table

Question

How to apply SCD2 on a view or a table

Bigiron 110

Hi All,

We have a PROD DB and a reporting database and each night the tables are loaded from the prod to reporting db. There are no dimensions and fact tables as such. There are VIEWS which are used for reporting. I was wondering if this situation is possible. I have a scenario where the amount($$) changes every week (no fixed day of the week), I want to see the amount where the first time the status is closed, please see below. In the below scenario, the data in prod database will only show the latest which is the May 24th amount (does not append , it overwrites), since we are loading data on a daily basis , is there a way to retain both the records?

What I am thinking is to create a new table as History and append new records, the question here is not all the AUCTIONID changes , some never change, if I just append I will end up with duplicates. How can I achieve this using SSIS or ADF. Please advice.

Thanks

User's image

1 answer

Your answer

Answer 1

You will need to implement a change data capture (CDC) or slowly changing dimension (SCD) mechanism in your ETL (Extract, Transform, Load) process. However, given that you are loading data on a daily basis, this might not be the best approach.

A better approach may be to create a history table with a unique composite key that is made up of AUCTIONID and ADJUSTEDDATE. This would mean that every unique combination of AUCTIONID and ADJUSTEDDATE would be unique in the history table.

The SSIS/ADF job would first check if a record with the same AUCTIONID and ADJUSTEDDATE exists in the history table. If not, it would append the new record. If the record does exist, it could either ignore the record or update it based on your business requirements.

Load the data from your PROD DB into a staging table in your reporting DB.
Compare the staging table to your history table with a SQL statement like this:


SELECT s.*

FROM StagingTable s

LEFT JOIN HistoryTable h ON s.AUCTIONID = h.AUCTIONID AND s.ADJUSTEDDATE = h.ADJUSTEDDATE

WHERE h.AUCTIONID IS NULL

The above SQL statement will return all rows from your staging table that do not have a matching AUCTIONID and ADJUSTEDDATE in your history table.

Use a "Data Flow Task" in SSIS to load the results of this SQL statement into your history table.

For ADF (Azure Data Factory), you'd be following a similar process - your pipeline would have a "Lookup" activity to check for existing records in the history table, a "Copy" activity to append non-existing records into the history table, and potentially a "Stored Procedure" or "Data Flow" activity if you want to handle updates to existing records in any way.

Remember to have a step to truncate or clean your staging table to make it ready for the next load.

This approach ensures that your history table retains all unique records based on the combination of AUCTIONID and ADJUSTEDDATE, which will allow you to track the changes to the AMOUNT over time for each AUCTIONID.

Share via

How to apply SCD2 on a view or a table

1 answer

Your answer