Custom Logs.

Question

I have a CSV file (say "empfile.csv") in my data lake which is then loaded into Azure Synapse's dedicated SQL pool. Now, My data exists at both Data Lake and SQL pool.

User's image

Suppose, I update some field values in this data (say "Manager" to "Se. Manager" for emp_id:101) and upload it again. It will update the table in synapse with updated values.

User's image

Task: I want to create custom logs that provide output as below and store it in my warehouse:

User's image Any idea how can I implement it or does Azure provide any such functionality for auditing the data?
Thanks.

Answer

Azure Synapse does not natively provide a comprehensive change tracking or auditing solution, so you can use the dedicated SQL pool with triggers

Start by creating a table in your SQL pool to store the audit logs.

 CREATE TABLE audit_logs (
     Audit_Id INT IDENTITY(1,1) PRIMARY KEY,
     TableName NVARCHAR(50),
     ChangedBy NVARCHAR(50),
     emp_id INT,
     ChangedFromValues NVARCHAR(MAX),
     ChangedToValues NVARCHAR(MAX),
     DateTime DATETIME
 )

Then create a staging table where you can load the new data before merging it with the existing data.

CREATE TABLE empfile_staging ( emp_id INT,
emp_name NVARCHAR(50),
 emp_design NVARCHAR(50),
  dept_id INT)

Load the updated data in your staging table :

COPY INTO empfile_staging FROM 'path_to_your_csv_file' WITH ( FILE_TYPE = 'CSV', FIRSTROW = 2, -- Assuming the first row is a header FIELDTERMINATOR = ',', ROWTERMINATOR = '
' );

Then create a trigger that fires before an update operation on your main table :

CREATE TRIGGER trg_empfile_update
ON empfile
FOR UPDATE
AS
BEGIN
    DECLARE @emp_id INT, @old_emp_design NVARCHAR(50), @new_emp_design NVARCHAR(50), @changed_by NVARCHAR(50), @current_time DATETIME;
    
    SELECT @emp_id = INSERTED.emp_id,
           @old_emp_design = DELETED.emp_design,
           @new_emp_design = INSERTED.emp_design,
           @changed_by = 'abc person', -- You can change this to get the actual user
           @current_time = GETDATE()
    FROM INSERTED
    JOIN DELETED ON INSERTED.emp_id = DELETED.emp_id;
    IF @old_emp_design <> @new_emp_design
    BEGIN
        INSERT INTO audit_logs (TableName, ChangedBy, emp_id, ChangedFromValues, ChangedToValues, DateTime)
        VALUES ('empfile', @changed_by, @emp_id, @old_emp_design, @new_emp_design, @current_time);
    END
END

Merge the data from the staging table to the main table. This will trigger the above trigger for any updates.

MERGE INTO empfile AS target
USING empfile_staging AS source
ON target.emp_id = source.emp_id
WHEN MATCHED AND (target.emp_design <> source.emp_design OR target.emp_name <> source.emp_name OR target.dept_id <> source.dept_id)
THEN
    UPDATE SET target.emp_name = source.emp_name,
               target.emp_design = source.emp_design,
               target.dept_id = source.dept_id
WHEN NOT MATCHED
THEN
    INSERT (emp_id, emp_name, emp_design, dept_id)
    VALUES (source.emp_id, source.emp_name, source.emp_design, source.dept_id)

Optionally, truncate the staging table after the merge operation to keep it clean for future use.

TRUNCATE TABLE empfile_staging

Share via

Custom Logs.

1 answer

Your answer