ADF PII Detect and Masking - Drifted schema

Muhammad Abdulbaqi 20 Reputation points
2024-11-15T19:30:46.6433333+00:00

Hi,

I followed the azure data factory documentation for adf pii detect and masking and when I have run the debugger it seems I'm getting an issue of the content being drifted. I have followed this documentation (https://learn.microsoft.com/en-us/azure/data-factory/solution-template-pii-detection-and-masking) and made no changes whatsoever. I'm assuming maybe I have to set specific fields in the data source (the data is currently a short txt file with several pii)

User's image

Or perhaps maybe in the request body

User's image

This is the data preview by the way

User's image

This I assume is causing none of the pii entities being placed in the appropriate fields. Additionally, when I run the pipeline it was successful so is there a place to view the new masked txt doucment

Any help is appreciated

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,999 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 12,310 Reputation points Microsoft Vendor
    2024-11-15T19:45:02.6666667+00:00

    @Muhammad Abdulbaqi

    Thanks for reaching out to Microsoft Q&A.

    It sounds like you’re encountering a common issue with schema drift in Azure Data Factory (ADF) when using the PII detection and masking feature.

    1. Check Schema Alignment
    • Ensure that the schema defined in ADF matches the actual structure of your text file. If there are discrepancies (like missing fields or different data types), it can lead to issues with PII detection.
    • You mentioned that your data is in a short text file. Make sure that the fields you expect to mask are correctly defined in the schema.
    1. Adjust Data Source Settings
    • If your text file has a different structure than expected, you may need to adjust the data source settings. This includes specifying the correct delimiters and ensuring that the headers are correctly interpreted.
    1. Review Request Body
    • If you’re using a request body to specify PII entities, double-check that the fields are correctly referenced. Any mismatch here can prevent the system from identifying the PII correctly.
    1. Debugging the Pipeline
    • Since you mentioned that the pipeline ran successfully, check the output logs for any warnings or messages that might indicate what went wrong during the PII detection phase.
    • You can also use the Data Preview feature in ADF to see how the data is being interpreted before and after masking.
    1. Viewing the Masked Document
    • After the pipeline runs, the masked output should be stored in the destination you specified in your pipeline configuration. Check the output dataset settings to find where the masked text document is saved.
    • If you haven’t specified an output location, you may need to set that up in your pipeline to ensure you can access the masked data.
    1. Testing with Sample Data
    • If possible, create a small sample text file with known PII values and test the pipeline with that. This can help you isolate whether the issue is with the data itself or the configuration.

    Hope this helps. Do let us know if you any further queries.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.