I think you need a way to identify new or changed data since the last load :
- If your
Events
table has a timestamp column that records the row insertion or last update time, you can use this to identify new rows - If the table has an auto-incrementing primary key, you can track the last copied key value
- Azure SQL Database supports CDC and CT, which can be used to track changes to the data
For Azure Data Factory configuration :
If you're using a timestamp or an auto-incrementing primary key for tracking changes, you can create a watermark table in Azure SQL DB. This table stores the last copied timestamp or ID value.
Use a Lookup activity in your ADF pipeline to retrieve the last copied watermark value from the watermark table or a variable.
Then, configure the source of your Copy Data activity to use a query that selects only the rows with a timestamp greater than the last watermark value or where the primary key is greater than the last copied ID.
For example, if using a timestamp column:
SELECT * FROM Events WHERE EventTimestamp > @lastWatermark
Or, if using an auto-incrementing ID:
SELECT * FROM Events WHERE EventID > @lastCopiedID
After copying the new rows, use a Stored Procedure activity or another Lookup to update the watermark table with the new last copied timestamp or ID.
To automate this process every 5 minutes, configure a Tumbling Window Trigger in ADF that initiates the pipeline at your specified frequency.
For direct API access, Snowflake offers Snowpipe, a service for continuously loading data. However, integrating Snowpipe directly with Azure SQL DB might require an intermediary service or custom code to extract the data and pass it to Snowpipe's REST API. This approach would be more complex and outside the standard ADF capabilities.