Create or Update Indexer (Preview REST API)

Applies to: 2023-07-01-Preview, 2021-04-30-Preview, 2020-06-30-Preview


2023-07-01-Preview (no changes).

2021-04-30-Preview adds managed identity support for enrichment cache and encryption keys:

  • "storageConnectionString" accepts a Resource ID for a system-assigned managed identity connection to Azure Storage. This property is under "cache". User-assigned managed identity is not supported.
  • "identity" accepts a user-assigned managed identity.

2020-06-30-Preview adds:

An indexer automates indexing from supported data sources by connecting to a predefined data source, retrieving and serializing data, and passing it to a search service for data ingestion. For AI enrichment of image and unstructured text, indexers can also accept a skillset that adds image and natural language processing.

You can use either POST or PUT on a create request. For either one, the request body provides the object definition.

POST https://[service name][api-version]
    Content-Type: application/json  
    api-key: [admin key]  

For update requests, use PUT and specify the indexer name on the URI.

PUT https://[service name][indexer name]?api-version=[api-version]
    Content-Type: application/json  
    api-key: [admin key]    

HTTPS is required for all service requests. If the indexer doesn't exist, it's created. If it already exists, it's updated to the new definition but you must issue a Run Indexer request if you want indexer execution.

Creating an indexer adds it to your search service and runs it. If the request is successful, the index will be populated with searchable content from the data source.

Updating an indexer doesn't automatically run it, but depending on your modifications and the associated a data source, a reset and rerun might be required. When you update an existing indexer, the entire definition is replaced with the contents of the request body. In general, the best pattern to use for updates is to retrieve the indexer definition with a GET, modify it, and then update it with PUT.

Indexer configuration varies based on the type of data source. For data-platform-specific guidance on creating indexers, start with Indexers overview, which includes the complete list of related articles.


The maximum number of indexers that you can create varies by pricing tier. For more information, see Service limits for Azure AI Search.

URI Parameters

Parameter Description
service name Required. Set this to the unique, user-defined name of your search service.
indexer name Required on the URI if using PUT. The name must be lower case, start with a letter or number, have no slashes or dots, and be fewer 128 characters. After you start the name with a letter or number, the rest of the name can include any letter, number and dashes, as long as the dashes aren't consecutive.
api-version Required. The current preview version is 2023-07-01-Preview. See API versions for more versions.

Request Headers

The following table describes the required and optional request headers.

Fields Description
Content-Type Required. Set this to application/json
api-key Optional if you're using Azure roles and a bearer token is provided on the request, otherwise a key is required. An api-key is a unique, system-generated string that authenticates the request to your search service. Create requests must include an api-key header set to your admin key (as opposed to a query key). See Connect to Azure AI Search using key authentication for details.

Request Body

A data source, index, and skillset are part of an indexer definition, but each is an independent component that can be used in different combinations. For example, you could use the same data source with multiple indexers, or the same index with multiple indexers, or multiple indexers writing to a single index.

The following JSON is a high-level representation of the main parts of the definition.

    "name" : (optional on PUT; required on POST) "Name of the indexer",  
    "description" : (optional) "Anything you want, or nothing at all", 
    "dataSourceName" : (required) "Name of an existing data source",  
    "targetIndexName" : (required) "Name of an existing index",  
    "skillsetName" : (required for AI enrichment) "Name of an existing skillset",
    "cache":  { ... },
    "schedule" : (optional but runs once immediately if unspecified) { ... },  
    "parameters" : (optional) {
        "batchSize": null,
        "maxFailedItems": 0,
        "maxFailedItemsPerBatch": 0,
        "base64EncodeKeys": null,
        "configuration": { }
    "fieldMappings" : (optional) { ... },
    "outputFieldMappings" : (required for AI enrichment) { ... },
    "encryptionKey":(optional) { },
    "disabled" : (optional) Boolean value indicating whether the indexer is disabled. False by default.

Request contains the following properties:

Property Description
name Required. The name must be lower case, start with a letter or number, have no slashes or dots, and be fewer 128 characters. After you start the name with a letter or number, the rest of the name can include any letter, number and dashes, as long as the dashes aren't consecutive.
description Optional. Description of the indexer.
dataSourceName Required. Name of an existing data source that provides connection information and other properties.
targetIndexName Required. Name of an existing index.
skillsetName Required for AI enrichment. Name of an existing skillset.
cache Optional for AI enrichment, enables reuse of unchanged documents.
schedule Optional, but runs once immediately if unspecified.
parameters Optional. Properties for modifying runtime behavior.
fieldMappings Optional. Used when source and destination fields have different names.
outputFieldMappings Required for AI enrichment. Maps output from a skillset to an index or projection.
encryptionKey Optional. Used to encrypt indexer data at rest with your own keys, managed in your Azure Key Vault. To learn more, see Azure AI Search encryption using customer-managed keys in Azure Key Vault.
disabled Optional. Boolean value indicating whether the indexer is disabled. False by default.


201 Created for a successful request.


Example: Text-based indexer with schedule and parameter

This example creates an indexer that copies data from the table referenced by the order-sds data source to the orders-idx index on a schedule that starts on January 1, 2022 UTC and runs hourly. Each indexer invocation will be successful if no more than 5 items fail to be indexed in each batch, and no more than 10 items fail to be indexed in total. Field mappings provide a data path when field names and types don't match.

    "name" : "myindexer",  
    "description" : "a cool indexer",  
    "dataSourceName" : "orders-ds",  
    "targetIndexName" : "orders-idx", 
    "fieldMappings" : [
          "sourceFieldName" : "content",
          "targetFieldName" : "sourceContent"
    "schedule" : { "interval" : "PT1H", "startTime" : "2022-01-01T00:00:00Z" },  
    "parameters" : { "maxFailedItems" : 10, "maxFailedItemsPerBatch" : 5 }  

Example: Skillset indexer

This example demonstrates an AI enrichment, indicated by the reference to a skillset and outputFieldMappings that map skill outputs to fields in a search index. Skillsets are high-level resources, defined separately.

New in this preview and applicable to skillsets only, you can specify the cache property to reuse documents that are unaffected by changes in your skillset definition.

  "dataSourceName" : "demo-data",
  "targetIndexName" : "demo-index",
  "skillsetName" : "demo-skillset",
  "cache" : 
      "storageConnectionString" : "DefaultEndpointsProtocol=https;AccountName=<storage-account-name>;AccountKey=<storage-account-key>;",
      "enableReprocessing": true
  "fieldMappings" : [ ],
  "outputFieldMappings" : 
        "sourceFieldName" : "/document/organizations", 
        "targetFieldName" : "organizations"
    "dataToExtract": "contentAndMetadata",
    "imageAction": "generateNormalizedImages"

Example: Enrichment cache with a managed identity connection

This example illustrates the connection string format when using Azure Active Directory for authentication. The search service must be configured to use a managed identity. The identity must have "Storage Blob Data Contributor" permissions so that it can write to the cache. The connection string is the unique Resource ID of your storage account, and it must include the container used to store the cached enrichment.

  "dataSourceName" : "demodata-ds",
  "targetIndexName" : "demo-index",
  "skillsetName" : "demo-skillset",
  "cache" : 
      "storageConnectionString" : "ResourceId=/subscriptions/<subscription-ID>/resourceGroups/<resource-group-name>/providers/Microsoft.Storage/storageAccounts/<storage-account-name>/<container-name>;",
      "enableReprocessing": true
  "fieldMappings" : [  ],
  "outputFieldMappings" :  [  ],
  "parameters": {  }


Link Description
cache Configures caching for AI enrichment and skillset execution.
encryptionKey Configures a connection to Azure Key Vault for customer-managed encryption.
fieldMappings Source-to-destination field mappings for fields that don't match by name and type.
outputFieldMappings Maps nodes in an enriched document to fields in an index. Required if you are using skillsets.
parameters Configures an indexer. Parameters include general parameters and source-specific parameters.
schedule Specifies the interval and frequency of scheduled indexer execution.

cache (preview)

Incremental indexing is the ability to reuse enriched documents in the cache when processing a skillset. The most common scenario is reuse of OCR or image analysis of image files, which can be costly and time-consuming to process.

"cache" : 
    "storageConnectionString" : "<YOUR-STORAGE-ACCOUNT-CONNECTION-STRING>",
    "enableReprocessing": true

The cache object has required and optional properties.

Property Description
storageConnectionString Required. Specifies the storage account used to cache the intermediate results. Using the account you provide, the search service will create a blob container prefixed with ms-az-search-indexercache and completed with a GUID unique to the indexer. It must be set to either a full access connection string that includes a key, or the unique Resource ID of your storage account for requests that are authenticated using Azure AD.

To authenticate through Azure AD, the search service must be configured to use a managed identity, and that identity must have "Storage Blob Data Contributor" permission.
enableReprocessing Optional. Boolean property (true by default) to control processing over incoming documents already represented in the cache. When true (default), documents already in the cache are reprocessed when you rerun the indexer, assuming your skill update affects that doc. When false, existing documents aren't reprocessed, effectively prioritizing new, incoming content over existing content. You should only set enableReprocessing to false on a temporary basis. To ensure consistency across the corpus, enableReprocessing should be true most of the time, ensuring that all documents, both new and existing, are valid per the current skillset definition.
ID Read-only. Generated once the cache is created. The ID is the identifier of the container within the storage account that will be used as the cache for this indexer. This cache will be unique to this indexer and if the indexer is deleted and recreated with the same name, the ID will be regenerated. The ID can't be set, it's always generated by the service.


An indexer can optionally specify a schedule. Without a schedule, the indexer runs immediately when you send the request: connecting to, crawling, and indexing the data source. For some scenarios including long-running indexing jobs, schedules are used to extend the processing window beyond the 24-hour maximum. If a schedule is present, the indexer runs periodically as per schedule. The scheduler is built in; you can't use an external scheduler. A Schedule has the following attributes:

  • interval: Required. A duration value that specifies an interval or period for indexer runs. The smallest allowed interval is five minutes; the longest is one day. It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an ISO 8601 duration value). The pattern for this is: "P[nD][T[nH][nM]]". Examples: PT15M for every 15 minutes, PT2H for every 2 hours.

  • startTime: Optional. A UTC datetime when the indexer should start running.


If an indexer is set to a certain schedule but repeatedly fails on the same document over and over again each time it runs, the indexer will begin running on a less frequent interval (up to the maximum of at least once every 24 hours) until it successfully makes progress again. If you believe you have fixed whatever the issue that was causing the indexer to be stuck at a certain point, you can perform an on demand run of the indexer, and if that successfully makes progress, the indexer will return to its set schedule interval again.


An indexer can optionally take configuration parameters that modify runtime behaviors. Configuration parameters are comma-delimited on the indexer request.

  "name" : "my-blob-indexer-for-cognitive-search",
  ... other indexer properties
  "parameters" : { 
        "batchSize": null,
        "maxFailedItems": 0,
        "maxFailedItemsPerBatch": 0,
        "base64EncodeKeys": null,
        "configuration" : { 
            "parsingMode" : "json", 
            "indexedFileNameExtensions" : ".json, .jpg, .png", 
            "imageAction" : "generateNormalizedImages", 
            "dataToExtract" : "contentAndMetadata" } }

General parameters for all indexers

Parameter Type and allowed values Usage
"batchSize" Integer
Default is source-specific (1000 for Azure SQL Database and Azure Cosmos DB, 10 for Azure Blob Storage)
Specifies the number of items that are read from the data source and indexed as a single batch in order to improve performance.
"maxFailedItems" Integer
Default is 0
Number of errors to tolerate before an indexer run is considered a failure. Set to -1 if you don’t want any errors to stop the indexing process. You can retrieve information about failed items using Get Indexer Status.
"maxFailedItemsPerBatch" Integer
Default is 0
Number of errors to tolerate in each batch before an indexer run is considered a failure. Set to -1 if you don’t want any errors to stop the indexing process.
"base64EncodeKeys" Boolean
Default is true
Valid values are null, true, or false. When set to false, the indexer will not automatically base64 encode the values of the field designated as the document key. Setting this property eliminates the need to specify a mapping function that base64 encodes key values (such as dashes) that are not otherwise valid in a document key.

Blob configuration parameters

Several parameters are exclusive to a particular indexer, such as Azure blob indexing.

Parameter Type and allowed values Usage
"parsingMode" String
For Azure blobs, set to text to improve indexing performance on plain text files in blob storage.
For CSV blobs, set to delimitedText when blobs are plain CSV files.
For JSON blobs, set to json to extract structured content or to jsonArray to extract individual elements of an array as separate documents in Azure AI Search. Use jsonLines to extract individual JSON entities, separated by a new line, as separate documents in Azure AI Search.
"excludedFileNameExtensions" String
comma-delimited list
For Azure blobs, ignore any file types in the list. For example, you could exclude ".png, .png, .mp4" to skip over those files during indexing.
"indexedFileNameExtensions" String
comma-delimited list
For Azure blobs, selects blobs if the file extension is in the list. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types.
"failOnUnsupportedContentType" Boolean
false (default)
For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance.
"failOnUnprocessableDocument" Boolean
false (default)
For Azure blobs, set to false if you want to continue indexing if a document fails indexing.
Boolean true
false (default)
For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see Service Limits.
"delimitedTextHeaders" String
comma-delimited list
For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index.
"delimitedTextDelimiter" String
single character
For CSV blobs, specifies the end-of-line delimiter for CSV files where each line starts a new document (for example, "|").
"firstLineContainsHeaders" Boolean
true (default)
For CSV blobs, indicates that the first (non-blank) line of each blob contains headers.
"documentRoot" String
user-defined path
For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property.
"dataToExtract" String
"contentAndMetadata" (default)
For Azure blobs:
Set to "storageMetadata" to index just the standard blob properties and user-specified metadata.
Set to "allMetadata" to extract metadata provided by the Azure blob storage subsystem and the content-type specific metadata (for example, metadata unique to just .png files) are indexed.
Set to "contentAndMetadata" to extract all metadata and textual content from each blob.

For image-analysis in AI enrichment, when "imageAction" is set to a value other than "none", the "dataToExtract" setting tells the indexer which data to extract from image content. Applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs.
"imageAction" String
For Azure blobs, set to"none" to ignore embedded images or image files in the data set. This is the default.

For image-analysis in AI enrichment, set to"generateNormalizedImages" to extract text from images (for example, the word "stop" from a traffic Stop sign), and embed it as part of the content field. During image analysis, the indexer creates an array of normalized images as part of document cracking, and embeds the generated information into the content field. This action requires that "dataToExtract" is set to "contentAndMetadata". A normalized image refers to additional processing resulting in uniform image output, sized and rotated to promote consistent rendering when you include images in visual search results (for example, same-size photographs in a graph control as seen in the JFK demo). This information is generated for each image when you use this option.

If you set to "generateNormalizedImagePerPage", PDF files will be treated differently in that instead of extracting embedded images, each page will be rendered as an image and normalized accordingly. Non-PDF file types will be treated the same as if "generateNormalizedImages" was set.

Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer.
Any integer between 50-10000 The maximum width or height (in pixels) respectively for normalized images generated when an "imageAction" is set. The default is 2000.

The default of 2000 pixels for the normalized images maximum width and height is based on the maximum sizes supported by the OCR skill and the image analysis skill. The OCR skill supports a maximum width and height of 4200 for non-English languages, and 10000 for English. If you increase the maximum limits, processing could fail on larger images depending on your skillset definition and the language of the documents.
"allowSkillsetToReadFileData" Boolean
false (default)
Setting the "allowSkillsetToReadFileData" parameter to true will create a path /document/file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill. The object generated will be defined as follows: { "$type": "file", "data": "BASE64 encoded string of the file" }

Setting the "allowSkillsetToReadFileData" parameter to true requires that a skillset be attached to that indexer, that the "parsingMode" parameter is set to "default", "text" or "json", and the "dataToExtract" parameter is set to "contentAndMetadata" or "allMetadata".
"pdfTextRotationAlgorithm" String
"none" (default)
Setting the "pdfTextRotationAlgorithm" parameter to "detectAngles" may help produce better and more readable text extraction from PDF files that have rotated text within them. Note that there may be a small performance speed impact when this parameter is used. This parameter only applies to PDF files, and only to PDFs with embedded text. If the rotated text appears within an embedded image in the PDF, this parameter doesn't apply.

Setting the "pdfTextRotationAlgorithm" parameter to "detectAngles" requires that the "parsingMode" parameter is set to "default".

Azure Cosmos DB configuration parameters

The following parameters are specific to Cosmos DB indexers.

Parameter Type and allowed values Usage
"assumeOrderByHighWaterMarkColumn" Boolean For Cosmos DB indexers with SQL API, set this parameter to provide a hint to Cosmos DB that the query used to return documents for indexing is in fact ordered by the _ts column. Setting this parameter gives you better results for incremental indexing scenarios.

Azure SQL configuration parameters

The following parameters are specific to Azure SQL Database.

Parameter Type and allowed values Usage
"queryTimeout" String
Set this parameter to override the 5-minute default.
"convertHighWaterMarkToRowVersion" Boolean Set this parameter to "true" to use the rowversion data type for the high water mark column. When this property is set to true, the indexer subtracts one from the rowversion value before the indexer runs. It does this because views with one-to-many joins may have rows with duplicate rowversion values. Subtracting one ensures the indexer query doesn't miss these rows.
"disableOrderByHighWaterMarkColumn" Boolean Set this parameter to "true" if you want to disable the ORDER BY behavior in the query used for change detection. If you're using the high water mark change detection policy, the indexer uses WHERE and ORDER BY clauses to track which rows need indexing (WHERE [High Water Mark Column] > [Current High Water Mark Value] ORDER BY [High Water Mark Column]). This parameter disables the ORDER BY behavior. Indexing will finish faster, but the trade off is that if the indexer is interrupted for any reason, the entire indexer job must be repeated in full.


Create these when source-destination field names or types don't match, or when you want to specify a function. Field mappings are case-insensitive. See Define field mappings.

Attribute Description
sourceFieldName Required. Name of the source column.
targetFieldName Required. Name of the corresponding field in the search index.
mappingFunction Optional. Adds processing to source values en route to the search engine. For example, an arbitrary string value can be base64-encoded so it can be used to populate a document key field. A mapping function has a name and parameters. Valid values include:



Specifies skill outputs (or nodes in an enrichment tree) to fields in a search index.

  "outputFieldMappings" : [
          "sourceFieldName" : "/document/organizations", 
          "targetFieldName" : "organizations"
          "sourceFieldName" : "/document/pages/*/keyPhrases/*", 
          "targetFieldName" : "keyphrases"
            "sourceFieldName": "/document/languageCode",
            "targetFieldName": "language",
            "mappingFunction": null


Configures a connection to Azure Key Vault for supplemental customer-managed encryption keys (CMK). Encryption with customer-managed keys is not available for free services. For billable services, it's only available for search services created on or after 2019-01-01.

A connection to the key vault must be authenticated. You can use either "accessCredentials" or a managed identity for this purpose.

Managed identities can be system or user-assigned (preview). If the search service has both a system-assigned managed identity and a role assignment that grants read access to the key vault, you can omit both "identity" and "accessCredentials", and the request will authenticate using the system managed identity. If the search service has user-assigned identity and role assignment, set the "identity" property to the resource ID of that identity.

Attribute Description
keyVaultKeyName Required. Name of the Azure Key Vault key used for encryption.
keyVaultKeyVersion Required. Version of the Azure Key Vault key.
keyVaultUri Required. URI of Azure Key Vault (also referred to as DNS name) that provides the key. An example URI might be
accessCredentials Omit if you are using a managed identity. Otherwise, the properties of accessCredentials include applicationId (an Azure Active Directory Application ID that has access permissions to your specified Azure Key Vault), and applicationSecret (the authentication key of the specified Azure AD application).
identity Optional unless you are using a user-assigned managed identity for the search service connection to Azure Key Vault. The format is "/subscriptions/[subscription ID]/resourceGroups/[resource group name]/providers/Microsoft.ManagedIdentity/userAssignedIdentities/[managed identity name]".

