Tutorial: Configure Azure Storage to de-identify documents
Article
The Azure Health Data Services de-identification service can de-identify documents in Azure Storage via an asynchronous job. If you have many documents that you would like
to de-identify, using a job is a good option. Jobs also provide consistent surrogation, meaning that surrogate values in the de-identified output will match across
all documents. For more information about de-identification, including consistent surrogation, see What is the de-identification service?
When you choose to store documents in Azure Blob Storage, you're charged based on Azure Storage pricing. This cost isn't included in the
de-identification service pricing. Explore Azure Blob Storage pricing.
Install Azure CLI and open your terminal of choice. In this tutorial, we're using PowerShell.
Create a storage account and container
Set your context, substituting the subscription name containing your de-identification service for the <subscription_name> placeholder:
PowerShell
az account set --subscription"<subscription_name>"
Save a variable for the resource group, substituting the resource group containing your de-identification service for the <resource_group> placeholder:
PowerShell
$ResourceGroup = "<resource_group>"
Create a storage account, providing a value for the <storage_account_name> placeholder:
Assign yourself a role to perform data operations on the storage account:
PowerShell
$UserId = $(az ad signed-in-user show --query id -o tsv)
az role assignment create --role"Storage Blob Data Contributor" --assignee$UserId --scope$StorageAccountId
Create a container to hold your sample document:
PowerShell
az storage container create --account-name$StorageAccountName --name deidtest --auth-mode login
Upload a sample document
Next, you upload a document that contains synthetic PHI:
PowerShell
$DocumentContent = "The patient came in for a visit on 10/12/2023 and was seen again November 4th at Contoso Hospital."
az storage blob upload --data$DocumentContent --account-name$StorageAccountName --container-name deidtest --name deidsample.txt --auth-mode login
Grant the de-identification service access to the storage account
In this step, you grant the de-identification service's system-assigned managed identity role-based access to the container. You grant the Storage Blob
Data Contributor role because the de-identification service will both read the original document and write de-identified output documents. Substitute the name of
your de-identification service for the <deid_service_name> placeholder:
PowerShell
$DeidServicePrincipalId=$(az resource show -n <deid_service_name> -g$ResourceGroup --resource-type microsoft.healthdataaiservices/deidservices --query identity.principalId --output tsv)
az role assignment create --assignee$DeidServicePrincipalId --role"Storage Blob Data Contributor" --scope$StorageAccountId
Configure network isolation on the storage account
Next, you update the storage account to disable public network access and only allow access from trusted Azure services such as the de-identification service.
After running this command, you won't be able to view the storage container contents without setting a network exception.
Learn more at Configure Azure Storage firewalls and virtual networks.
PowerShell
az storage account update --name$StorageAccountName --public-network-access Disabled --bypass AzureServices
Clean up resources
Once you're done with the storage account, you can delete the storage account and role assignments:
PowerShell
az role assignment delete --assignee$DeidServicePrincipalId --role"Storage Blob Data Contributor" --scope$StorageAccountId
az role assignment delete --assignee$UserId --role"Storage Blob Data Contributor" --scope$StorageAccountId
az storage account delete --ids$StorageAccountId --yes