You will need to implement a change data capture (CDC) or slowly changing dimension (SCD) mechanism in your ETL (Extract, Transform, Load) process. However, given that you are loading data on a daily basis, this might not be the best approach.
A better approach may be to create a history table with a unique composite key that is made up of AUCTIONID
and ADJUSTEDDATE
. This would mean that every unique combination of AUCTIONID
and ADJUSTEDDATE
would be unique in the history table.
The SSIS/ADF job would first check if a record with the same AUCTIONID
and ADJUSTEDDATE
exists in the history table. If not, it would append the new record. If the record does exist, it could either ignore the record or update it based on your business requirements.
- Load the data from your PROD DB into a staging table in your reporting DB.
- Compare the staging table to your history table with a SQL statement like this:
SELECT s.*
FROM StagingTable s
LEFT JOIN HistoryTable h ON s.AUCTIONID = h.AUCTIONID AND s.ADJUSTEDDATE = h.ADJUSTEDDATE
WHERE h.AUCTIONID IS NULL
The above SQL statement will return all rows from your staging table that do not have a matching AUCTIONID
and ADJUSTEDDATE
in your history table.
- Use a "Data Flow Task" in SSIS to load the results of this SQL statement into your history table.
For ADF (Azure Data Factory), you'd be following a similar process - your pipeline would have a "Lookup" activity to check for existing records in the history table, a "Copy" activity to append non-existing records into the history table, and potentially a "Stored Procedure" or "Data Flow" activity if you want to handle updates to existing records in any way.
Remember to have a step to truncate or clean your staging table to make it ready for the next load.
This approach ensures that your history table retains all unique records based on the combination of AUCTIONID
and ADJUSTEDDATE
, which will allow you to track the changes to the AMOUNT
over time for each AUCTIONID
.