Where to store the Watermark table for Data Warehouse ingestion ?

kosmos 246 Reputation points
2022-10-10T11:58:52.933+00:00

I am following the Azure tutorial on data ingestion to Synapse Analytics DWH.

To do an incremental load, Microsoft Documentation proposes the use of Watermark tables to store the last modified date.

In the tutorial they use Azure SQL Database to store the Watermark table.

My question regarding this:

Why not storing the watermark table in Synpase SQL Dedicated Pool ?
I want to avoid to create another database just for watermark tables. I would like to keep it simple.

Thanks in advance!

Azure SQL Database
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,385 questions
{count} votes

Accepted answer
  1. Alberto Morillo 34,681 Reputation points
    2022-10-11T03:55:10.073+00:00

    The watermark table is in the data source because there is where you need to know what data changed in order to continue gather rows from where you left the previous time.

    In case Azure Synapse is not the source but the destination, you can definitely create a table on Azure Synapse that can mirror the data source. Imagine the data source is Azure Table Storage, you can create a table on Synapse that mirror the Azure Table Storage, and where the watermark value could be the date field in the Azure Storage Table. You can create a pipeline in Azure Synapse that move the data and then run a stored procedure that recreate the MAX Date time.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.