Edit

Share via


Tutorial: Index permission metadata from ADLS Gen2 and query with permission-filtered results

This tutorial demonstrates how to index Azure Data Lake Storage (ADLS) Gen2 Access Control Lists (ACLs) and role-based access control (RBAC) scope into a search index using an indexer.

It also shows you how to structure a query that respects user access permissions. A successful query outcome confirms the permission transfer that occurred during index.

For more information about indexing ACLs, see Use an ADLS Gen2 indexer to ingest permission metadata.

In this tutorial, you learn how to:

  • Configure RBAC scope and ACLs on an adlsgen2 data source
  • Create an Azure AI Search index containing permission information fields
  • Create and run an indexer to ingest permission information into an index from a data source
  • Search the index you just created

Use a REST client to complete this tutorial and the 2025-05-01-preview REST API. There's no currently no support for ACL indexing in the Azure portal.

Prerequisites

  • An Azure account with an active subscription. Create an account for free.

  • Microsoft Entra ID authentication and authorization. Services and apps must be in the same tenant. Role assignments are used for each authenticated connection. Users and groups must be in the same tenant. You should have user and groups to work with. Creating tenants and security principals is out-of-scope for this tutorial.

  • ADLS Gen2 with a hierarchical namespace.

  • Files in a hierarchical folder structure. This tutorial assumes the ADLS Gen2 demo of folder structure for file /Oregon/Portland/Data.txt. This tutorial guides you through ACL assignment on folders and files so that you can complete the exercise successfully.

  • Azure AI Search, any region. Basic tier or higher is required for managed identity support.

  • Visual Studio Code with a REST client or a Python client and Jupyter package.

Prepare sample data

Upload the state parks sample data to a container in ADLS Gen2. The container name should be "parks" and it should have two folders: "Oregon" and "Washington".

Check search service configuration

You search service must be configured for Microsoft Entra ID authentication and authorization. Review this checklist to make sure you're prepared.

Get a personal identity token for local testing

This tutorial assumes a REST client on a local system, connecting to Azure over a public internet connection.

Follow these steps to acquire a personal identity token and set up Visual Studio Code for local connections to your Azure resources.

Set permissions in ADLS Gen2

As a best practice, use Group sets rather than directly assigning User sets.

  1. Grant the search service identity read access to the container. The indexer connects to Azure Storage under the search service identity. The search service must have Storage Blob Data Reader permissions to retrieve data.

  2. Grant per-group or user permissions in the file hierarchy. In the file hierarchy, identify all Group and User sets that are assigned to containers, directories, and files.

  3. You can use the Azure portal to manage ACLs. In Storage Browser, select the Oregon directory and then select Manage ACL from the context menu.

  4. Add new security principals for users and groups.

  5. Remove existing principals for owning groups, owning users, and other. These principals aren't supported for ACL indexing during the public preview.

Create a search index for permission metadata

Create an index that contains fields for content and permission metadata.

Be sure to use 2025-05-01-preview data plane REST API or a prerelease Azure SDK that provides equivalent functionality. The permission filter properties are only available in the preview APIs.

For demo purposes, the permission field has retrievable enabled so that you can check the values from the index. In a production environment, you should disable retrievable to avoid leaking sensitive information.

{
  "name" : "my-adlsgen2-acl-index",
  "fields": [
    {
      "name": "name", "type": "Edm.String",
      "searchable": true, "filterable": false, "retrievable": true
    },
    {
      "name": "description", "type": "Edm.String",
      "searchable": true, "filterable": false, "retrievable": true    
    },
    {
      "name": "location", "type": "Edm.String",
      "searchable": true, "filterable": false, "retrievable": true
    },
    {
      "name": "state", "type": "Edm.String",
      "searchable": true, "filterable": false, "retrievable": true
    },
    {
      "name": "AzureSearch_DocumentKey", "type": "Edm.String",
      "searchable": true, "filterable": false, "retrievable": true
      "stored": true,
      "key": true
    },
    { 
      "name": "UserIds", "type": "Collection(Edm.String)", 
      "permissionFilter": "userIds", 
      "searchable": true, "filterable": false, "retrievable": true
    },
    { 
      "name": "GroupIds", "type": "Collection(Edm.String)", 
      "permissionFilter": "groupIds", 
      "searchable": true, "filterable": false, "retrievable": true
    },
    { 
      "name": "RbacScope", "type": "Edm.String", 
      "permissionFilter": "rbacScope", 
      "searchable": true, "filterable": false, "retrievable": true
    }
  ],
  "permissionFilterOption": "enabled"
}

Create a data source

Modify data source configuration to specify indexer permission ingestion and the types of permission metadata that you want to index.

A data source needs indexerPermissionOptions.

In this tutorial, use a system-assigned managed identity for the authenticated connection.

{
    "name" : "my-adlsgen2-acl-datasource",
    "type": "adlsgen2",
    "indexerPermissionOptions": ["userIds", "groupIds", "rbacScope"],
    "credentials": {
    "connectionString": "ResourceId=/subscriptions/<your subscription ID>/resourceGroups/<your resource group name>/providers/Microsoft.Storage/storageAccounts/<your storage account name>/;"
    },
    "container": {
    "name": "parks",
    "query": null
    }
}

Create and run the indexer

Indexer configuration for permission ingestion is primarily about defining fieldMappings from permission metadata.

{
  "name" : "my-adlsgen2-acl-indexer",
  "dataSourceName" : "my-adlsgen2-acl-datasource",
  "targetIndexName" : "my-adlsgen2-acl-index",
  "parameters": {
    "batchSize": null,
    "maxFailedItems": 0,
    "maxFailedItemsPerBatch": 0,
    "configuration": {
      "dataToExtract": "contentAndMetadata",
      "parsingMode": "delimitedText",
      "firstLineContainsHeaders": true,
      "delimitedTextDelimiter": ",",
      "delimitedTextHeaders": ""
      },
  "fieldMappings": [
    { "sourceFieldName": "metadata_user_ids", "targetFieldName": "UserIds" },
    { "sourceFieldName": "metadata_group_ids", "targetFieldName": "GroupIds" },
    { "sourceFieldName": "metadata_rbac_scope", "targetFieldName": "RbacScope" }
    ]
  }
}

After indexer creation and immediate run, the file content along with permission metadata information are indexed into the index.

Run a query to check results

Now that documents are loaded, you can issue queries against them by using Documents - Search Post (REST).

The URI is extended to include a query input, which is specified by using the /docs/search operator. The query token is passed in the request header. For more information, see Query-Time ACL and RBAC enforcement.

POST  {{endpoint}}/indexes/stateparks/docs/search?api-version=2025-05-01-preview
Authorization: Bearer {{search-token}}
x-ms-query-source-authorization: {{search-token}}
Content-Type: application/json

{
    "search": "*",
    "select": "name,description,location,GroupIds",
    "orderby": "name asc"
}