ADF PII Detect and Masking - Drifted schema

Question

ADF PII Detect and Masking - Drifted schema

Muhammad Abdulbaqi 20

Hi,

I followed the azure data factory documentation for adf pii detect and masking and when I have run the debugger it seems I'm getting an issue of the content being drifted. I have followed this documentation (https://learn.microsoft.com/en-us/azure/data-factory/solution-template-pii-detection-and-masking) and made no changes whatsoever. I'm assuming maybe I have to set specific fields in the data source (the data is currently a short txt file with several pii)

User's image

Or perhaps maybe in the request body

User's image

This is the data preview by the way

User's image

This I assume is causing none of the pii entities being placed in the appropriate fields. Additionally, when I run the pipeline it was successful so is there a place to view the new masked txt doucment

Any help is appreciated

Muhammad Abdulbaqi 20 Reputation points

2024-11-15T23:58:12.34+00:00

Hi, firstly I would like to say sorry if these questions are obvious I'm relatively new to ADF. So my txt file contains name and phone number and in the createRequestbody both name and number are present. I'm unclear on what I have to do here.

The data source projectection contains several fields including the ones I want

The createrequestbody has column for analysisinput = @(documents=array(@(id="1", language="en", text=concatWS(', ', name,email,phone_number,case_month,res_state,res_county,age_group,sex,race,ethnicity,process,exposure_yn,current_status,symptom_status,hosp_yn,icu_yn,icu_yn,death_yn,underlying_conditions_yn)))) and column for text = concatWS(', ', name,email,phone_number,case_month,res_state,res_county,age_group,sex,race,ethnicity,process,exposure_yn,current_status,symptom_status,hosp_yn,icu_yn,icu_yn,death_yn,underlying_conditions_yn). These were there by default I didn't make any changes.

Perhaps this is an issue with the externalcogcallservices because this contains a tab on mapping. And once I preview it seems even though the input document had name and number these entities aren't being recognized and put in their respect columns instead of being a column of its own. Any advice would be appreicated
Muhammad Abdulbaqi 20 Reputation points

2024-11-16T00:23:20.24+00:00

Hi, one thing I realized is in the externalcogservices when I tried testing my connection -> Failure to read most recent page request: DF-REST_001 - Error response from server: Some({"error":{"code":"404","message": "Resource not found"}}), Status code: 404. Please check your request url and body. (url:https://dsdlang.cognitiveservices.azure.com/,request body: None, request method: GET). Do you think this could be a reason for the failure to map correctly? And if so, how should I go about fixing it?
phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-11-19T17:20:01.5866667+00:00
@Muhammad Abdulbaqi

Let’s break down your questions

Understanding the Create Request Body

Your createRequestBody seems to be set up to concatenate multiple fields, including name and phone number. Here’s what you should check:

Field Names: Ensure that the field names in your text file exactly match those in the createRequestBody. Any discrepancies (like typos or case sensitivity) can prevent ADF from recognizing the fields correctly.

Mapping: In the externalCogCallServices, ensure that the mapping is correctly set up to recognize the fields you want to mask. If the mapping isn’t configured properly, it may not identify the PII entities.

404 Error in External Cognitive Services

The error message you received indicates that the URL for the Cognitive Services endpoint might be incorrect or the resource is not found. Here’s how to troubleshoot this:

Check the URL: Make sure that the URL you are using (https://dsdlang.cognitiveservices.azure.com/) is correct. It should point to the specific Cognitive Services resource you created in Azure.

Resource Availability: Ensure that the Cognitive Services resource is active and properly configured. You can check this in the Azure portal.

Authentication: Verify that you are using the correct authentication method (API key or token) required to access the service.

Mapping Issues

If the input document contains name and phone number but they aren’t being recognized:

Test with Sample Data: Create a small sample text file with just the name and phone number to see if the issue persists. This can help isolate whether the problem is with the data or the configuration.

Review Mapping Tab: In the externalCogCallServices, check the mapping tab to ensure that the fields are correctly mapped to the expected output. If necessary, manually map the fields to ensure they align with your input data.

Debugging Steps

Logs and Output: Check the logs for any additional error messages or warnings that might provide more context on what’s going wrong.

Data Preview: Use the Data Preview feature in ADF to see how the data is being interpreted before and after the masking process
phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-11-20T18:38:09.8333333+00:00

@Muhammad Abdulbaqi We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

1 answer

Your answer

Muhammad Abdulbaqi 20 Reputation points

2024-11-15T23:58:12.34+00:00

Hi, firstly I would like to say sorry if these questions are obvious I'm relatively new to ADF. So my txt file contains name and phone number and in the createRequestbody both name and number are present. I'm unclear on what I have to do here.

The data source projectection contains several fields including the ones I want

The createrequestbody has column for analysisinput = @(documents=array(@(id="1", language="en", text=concatWS(', ', name,email,phone_number,case_month,res_state,res_county,age_group,sex,race,ethnicity,process,exposure_yn,current_status,symptom_status,hosp_yn,icu_yn,icu_yn,death_yn,underlying_conditions_yn)))) and column for text = concatWS(', ', name,email,phone_number,case_month,res_state,res_county,age_group,sex,race,ethnicity,process,exposure_yn,current_status,symptom_status,hosp_yn,icu_yn,icu_yn,death_yn,underlying_conditions_yn). These were there by default I didn't make any changes.

Perhaps this is an issue with the externalcogcallservices because this contains a tab on mapping. And once I preview it seems even though the input document had name and number these entities aren't being recognized and put in their respect columns instead of being a column of its own. Any advice would be appreicated
Muhammad Abdulbaqi 20 Reputation points

2024-11-16T00:23:20.24+00:00

Hi, one thing I realized is in the externalcogservices when I tried testing my connection -> Failure to read most recent page request: DF-REST_001 - Error response from server: Some({"error":{"code":"404","message": "Resource not found"}}), Status code: 404. Please check your request url and body. (url:https://dsdlang.cognitiveservices.azure.com/,request body: None, request method: GET). Do you think this could be a reason for the failure to map correctly? And if so, how should I go about fixing it?
phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-11-20T18:38:09.8333333+00:00

@Muhammad Abdulbaqi We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

@Muhammad Abdulbaqi

Thanks for reaching out to Microsoft Q&A.

It sounds like you’re encountering a common issue with schema drift in Azure Data Factory (ADF) when using the PII detection and masking feature.

Check Schema Alignment

Ensure that the schema defined in ADF matches the actual structure of your text file. If there are discrepancies (like missing fields or different data types), it can lead to issues with PII detection.
You mentioned that your data is in a short text file. Make sure that the fields you expect to mask are correctly defined in the schema.

Adjust Data Source Settings

If your text file has a different structure than expected, you may need to adjust the data source settings. This includes specifying the correct delimiters and ensuring that the headers are correctly interpreted.

Review Request Body

If you’re using a request body to specify PII entities, double-check that the fields are correctly referenced. Any mismatch here can prevent the system from identifying the PII correctly.

Debugging the Pipeline

Since you mentioned that the pipeline ran successfully, check the output logs for any warnings or messages that might indicate what went wrong during the PII detection phase.
You can also use the Data Preview feature in ADF to see how the data is being interpreted before and after masking.

Viewing the Masked Document

After the pipeline runs, the masked output should be stored in the destination you specified in your pipeline configuration. Check the output dataset settings to find where the masked text document is saved.
If you haven’t specified an output location, you may need to set that up in your pipeline to ensure you can access the masked data.

Testing with Sample Data

If possible, create a small sample text file with known PII values and test the pipeline with that. This can help you isolate whether the issue is with the data itself or the configuration.

Hope this helps. Do let us know if you any further queries.

Share via

ADF PII Detect and Masking - Drifted schema

1 answer

Your answer