Indexers - Create
Creates a new indexer.
POST {endpoint}/indexers?api-version=2024-05-01-preview
URI Parameters
Name | In | Required | Type | Description |
---|---|---|---|---|
endpoint
|
path | True |
string |
The endpoint URL of the search service. |
api-version
|
query | True |
string |
Client Api Version. |
Request Header
Name | Required | Type | Description |
---|---|---|---|
x-ms-client-request-id |
string uuid |
The tracking ID sent with the request to help with debugging. |
Request Body
Name | Required | Type | Description |
---|---|---|---|
dataSourceName | True |
string |
The name of the datasource from which this indexer reads data. |
name | True |
string |
The name of the indexer. |
targetIndexName | True |
string |
The name of the index to which this indexer writes data. |
@odata.etag |
string |
The ETag of the indexer. |
|
cache |
Adds caching to an enrichment pipeline to allow for incremental modification steps without having to rebuild the index every time. |
||
description |
string |
The description of the indexer. |
|
disabled |
boolean |
A value indicating whether the indexer is disabled. Default is false. |
|
encryptionKey |
A description of an encryption key that you create in Azure Key Vault. This key is used to provide an additional level of encryption-at-rest for your indexer definition (as well as indexer execution status) when you want full assurance that no one, not even Microsoft, can decrypt them. Once you have encrypted your indexer definition, it will always remain encrypted. The search service will ignore attempts to set this property to null. You can change this property as needed if you want to rotate your encryption key; Your indexer definition (and indexer execution status) will be unaffected. Encryption with customer-managed keys is not available for free search services, and is only available for paid services created on or after January 1, 2019. |
||
fieldMappings |
Defines mappings between fields in the data source and corresponding target fields in the index. |
||
outputFieldMappings |
Output field mappings are applied after enrichment and immediately before indexing. |
||
parameters |
Parameters for indexer execution. |
||
schedule |
The schedule for this indexer. |
||
skillsetName |
string |
The name of the skillset executing with this indexer. |
Responses
Name | Type | Description |
---|---|---|
201 Created | ||
Other Status Codes |
Error response. |
Examples
SearchServiceCreateIndexer
Sample request
POST https://myservice.search.windows.net/indexers?api-version=2024-05-01-preview
{
"name": "myindexer",
"description": "a cool indexer",
"dataSourceName": "mydatasource",
"targetIndexName": "orders",
"schedule": {
"interval": "PT1H",
"startTime": "2015-01-01T00:00:00Z"
},
"parameters": {
"maxFailedItems": 10,
"maxFailedItemsPerBatch": 5
},
"encryptionKey": {
"keyVaultKeyName": "myUserManagedEncryptionKey-createdinAzureKeyVault",
"keyVaultKeyVersion": "myKeyVersion-32charAlphaNumericString",
"keyVaultUri": "https://myKeyVault.vault.azure.net",
"accessCredentials": {
"applicationId": "00000000-0000-0000-0000-000000000000",
"applicationSecret": "<applicationSecret>"
}
}
}
Sample response
{
"name": "myindexer",
"description": "a cool indexer",
"dataSourceName": "mydatasource",
"targetIndexName": "orders",
"schedule": {
"interval": "PT1H",
"startTime": "2015-01-01T00:00:00Z"
},
"parameters": {
"maxFailedItems": 10,
"maxFailedItemsPerBatch": 5
},
"fieldMappings": [],
"disabled": false,
"encryptionKey": {
"keyVaultKeyName": "myUserManagedEncryptionKey-createdinAzureKeyVault",
"keyVaultKeyVersion": "myKeyVersion-32charAlphaNumericString",
"keyVaultUri": "https://myKeyVault.vault.azure.net",
"accessCredentials": {
"applicationId": "00000000-0000-0000-0000-000000000000",
"applicationSecret": null
}
}
}
Definitions
Name | Description |
---|---|
Azure |
Credentials of a registered application created for your search service, used for authenticated access to the encryption keys stored in Azure Key Vault. |
Blob |
Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs. |
Blob |
Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer. |
Blob |
Represents the parsing mode for indexing from an Azure blob data source. |
Blob |
Determines algorithm for text extraction from PDF files in Azure blob storage. |
Error |
The resource management error additional info. |
Error |
The error detail. |
Error |
Error response |
Field |
Defines a mapping between a field in a data source and a target field in an index. |
Field |
Represents a function that transforms a value from a data source before indexing. |
Indexer |
Specifies the environment in which the indexer should execute. |
Indexing |
Represents parameters for indexer execution. |
Indexing |
A dictionary of indexer-specific configuration properties. Each name is the name of a specific property. Each value must be of a primitive type. |
Indexing |
Represents a schedule for indexer execution. |
Search |
Represents an indexer. |
Search |
|
Search |
Clears the identity property of a datasource. |
Search |
Specifies the identity for a datasource to use. |
Search |
A customer-managed encryption key in Azure Key Vault. Keys that you create and manage can be used to encrypt or decrypt data-at-rest, such as indexes and synonym maps. |
AzureActiveDirectoryApplicationCredentials
Credentials of a registered application created for your search service, used for authenticated access to the encryption keys stored in Azure Key Vault.
Name | Type | Description |
---|---|---|
applicationId |
string |
An AAD Application ID that was granted the required access permissions to the Azure Key Vault that is to be used when encrypting your data at rest. The Application ID should not be confused with the Object ID for your AAD Application. |
applicationSecret |
string |
The authentication key of the specified AAD application. |
BlobIndexerDataToExtract
Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs.
Name | Type | Description |
---|---|---|
allMetadata |
string |
Extracts metadata provided by the Azure blob storage subsystem and the content-type specific metadata (for example, metadata unique to just .png files are indexed). |
contentAndMetadata |
string |
Extracts all metadata and textual content from each blob. |
storageMetadata |
string |
Indexes just the standard blob properties and user-specified metadata. |
BlobIndexerImageAction
Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer.
Name | Type | Description |
---|---|---|
generateNormalizedImagePerPage |
string |
Extracts text from images (for example, the word "STOP" from a traffic stop sign), and embeds it into the content field, but treats PDF files differently in that each page will be rendered as an image and normalized accordingly, instead of extracting embedded images. Non-PDF file types will be treated the same as if "generateNormalizedImages" was set. |
generateNormalizedImages |
string |
Extracts text from images (for example, the word "STOP" from a traffic stop sign), and embeds it into the content field. This action requires that "dataToExtract" is set to "contentAndMetadata". A normalized image refers to additional processing resulting in uniform image output, sized and rotated to promote consistent rendering when you include images in visual search results. This information is generated for each image when you use this option. |
none |
string |
Ignores embedded images or image files in the data set. This is the default. |
BlobIndexerParsingMode
Represents the parsing mode for indexing from an Azure blob data source.
Name | Type | Description |
---|---|---|
default |
string |
Set to default for normal file processing. |
delimitedText |
string |
Set to delimitedText when blobs are plain CSV files. |
json |
string |
Set to json to extract structured content from JSON files. |
jsonArray |
string |
Set to jsonArray to extract individual elements of a JSON array as separate documents. |
jsonLines |
string |
Set to jsonLines to extract individual JSON entities, separated by a new line, as separate documents. |
text |
string |
Set to text to improve indexing performance on plain text files in blob storage. |
BlobIndexerPDFTextRotationAlgorithm
Determines algorithm for text extraction from PDF files in Azure blob storage.
Name | Type | Description |
---|---|---|
detectAngles |
string |
May produce better and more readable text extraction from PDF files that have rotated text within them. Note that there may be a small performance speed impact when this parameter is used. This parameter only applies to PDF files, and only to PDFs with embedded text. If the rotated text appears within an embedded image in the PDF, this parameter does not apply. |
none |
string |
Leverages normal text extraction. This is the default. |
ErrorAdditionalInfo
The resource management error additional info.
Name | Type | Description |
---|---|---|
info |
object |
The additional info. |
type |
string |
The additional info type. |
ErrorDetail
The error detail.
Name | Type | Description |
---|---|---|
additionalInfo |
The error additional info. |
|
code |
string |
The error code. |
details |
The error details. |
|
message |
string |
The error message. |
target |
string |
The error target. |
ErrorResponse
Error response
Name | Type | Description |
---|---|---|
error |
The error object. |
FieldMapping
Defines a mapping between a field in a data source and a target field in an index.
Name | Type | Description |
---|---|---|
mappingFunction |
A function to apply to each source field value before indexing. |
|
sourceFieldName |
string |
The name of the field in the data source. |
targetFieldName |
string |
The name of the target field in the index. Same as the source field name by default. |
FieldMappingFunction
Represents a function that transforms a value from a data source before indexing.
Name | Type | Description |
---|---|---|
name |
string |
The name of the field mapping function. |
parameters |
object |
A dictionary of parameter name/value pairs to pass to the function. Each value must be of a primitive type. |
IndexerExecutionEnvironment
Specifies the environment in which the indexer should execute.
Name | Type | Description |
---|---|---|
private |
string |
Indicates that the indexer should run with the environment provisioned specifically for the search service. This should only be specified as the execution environment if the indexer needs to access resources securely over shared private link resources. |
standard |
string |
Indicates that the search service can determine where the indexer should execute. This is the default environment when nothing is specified and is the recommended value. |
IndexingParameters
Represents parameters for indexer execution.
Name | Type | Default value | Description |
---|---|---|---|
batchSize |
integer |
The number of items that are read from the data source and indexed as a single batch in order to improve performance. The default depends on the data source type. |
|
configuration |
A dictionary of indexer-specific configuration properties. Each name is the name of a specific property. Each value must be of a primitive type. |
||
maxFailedItems |
integer |
0 |
The maximum number of items that can fail indexing for indexer execution to still be considered successful. -1 means no limit. Default is 0. |
maxFailedItemsPerBatch |
integer |
0 |
The maximum number of items in a single batch that can fail indexing for the batch to still be considered successful. -1 means no limit. Default is 0. |
IndexingParametersConfiguration
A dictionary of indexer-specific configuration properties. Each name is the name of a specific property. Each value must be of a primitive type.
Name | Type | Default value | Description |
---|---|---|---|
allowSkillsetToReadFileData |
boolean |
False |
If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill. |
dataToExtract | contentAndMetadata |
Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs. |
|
delimitedTextDelimiter |
string |
For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "|"). |
|
delimitedTextHeaders |
string |
For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index. |
|
documentRoot |
string |
For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property. |
|
excludedFileNameExtensions |
string |
Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing. |
|
executionEnvironment | standard |
Specifies the environment in which the indexer should execute. |
|
failOnUnprocessableDocument |
boolean |
False |
For Azure blobs, set to false if you want to continue indexing if a document fails indexing. |
failOnUnsupportedContentType |
boolean |
False |
For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance. |
firstLineContainsHeaders |
boolean |
True |
For CSV blobs, indicates that the first (non-blank) line of each blob contains headers. |
imageAction | none |
Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer. |
|
indexStorageMetadataOnlyForOversizedDocuments |
boolean |
False |
For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://docs.microsoft.com/azure/search/search-limits-quotas-capacity. |
indexedFileNameExtensions |
string |
Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types. |
|
parsingMode | default |
Represents the parsing mode for indexing from an Azure blob data source. |
|
pdfTextRotationAlgorithm | none |
Determines algorithm for text extraction from PDF files in Azure blob storage. |
|
queryTimeout |
string |
00:05:00 |
Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss". |
IndexingSchedule
Represents a schedule for indexer execution.
Name | Type | Description |
---|---|---|
interval |
string |
The interval of time between indexer executions. |
startTime |
string |
The time when an indexer should start running. |
SearchIndexer
Represents an indexer.
Name | Type | Default value | Description |
---|---|---|---|
@odata.etag |
string |
The ETag of the indexer. |
|
cache |
Adds caching to an enrichment pipeline to allow for incremental modification steps without having to rebuild the index every time. |
||
dataSourceName |
string |
The name of the datasource from which this indexer reads data. |
|
description |
string |
The description of the indexer. |
|
disabled |
boolean |
False |
A value indicating whether the indexer is disabled. Default is false. |
encryptionKey |
A description of an encryption key that you create in Azure Key Vault. This key is used to provide an additional level of encryption-at-rest for your indexer definition (as well as indexer execution status) when you want full assurance that no one, not even Microsoft, can decrypt them. Once you have encrypted your indexer definition, it will always remain encrypted. The search service will ignore attempts to set this property to null. You can change this property as needed if you want to rotate your encryption key; Your indexer definition (and indexer execution status) will be unaffected. Encryption with customer-managed keys is not available for free search services, and is only available for paid services created on or after January 1, 2019. |
||
fieldMappings |
Defines mappings between fields in the data source and corresponding target fields in the index. |
||
name |
string |
The name of the indexer. |
|
outputFieldMappings |
Output field mappings are applied after enrichment and immediately before indexing. |
||
parameters |
Parameters for indexer execution. |
||
schedule |
The schedule for this indexer. |
||
skillsetName |
string |
The name of the skillset executing with this indexer. |
|
targetIndexName |
string |
The name of the index to which this indexer writes data. |
SearchIndexerCache
Name | Type | Description |
---|---|---|
enableReprocessing |
boolean |
Specifies whether incremental reprocessing is enabled. |
identity | SearchIndexerDataIdentity: |
The user-assigned managed identity used for connections to the enrichment cache. If the connection string indicates an identity (ResourceId) and it's not specified, the system-assigned managed identity is used. On updates to the indexer, if the identity is unspecified, the value remains unchanged. If set to "none", the value of this property is cleared. |
storageConnectionString |
string |
The connection string to the storage account where the cache data will be persisted. |
SearchIndexerDataNoneIdentity
Clears the identity property of a datasource.
Name | Type | Description |
---|---|---|
@odata.type |
string:
#Microsoft. |
A URI fragment specifying the type of identity. |
SearchIndexerDataUserAssignedIdentity
Specifies the identity for a datasource to use.
Name | Type | Description |
---|---|---|
@odata.type |
string:
#Microsoft. |
A URI fragment specifying the type of identity. |
userAssignedIdentity |
string |
The fully qualified Azure resource Id of a user assigned managed identity typically in the form "/subscriptions/12345678-1234-1234-1234-1234567890ab/resourceGroups/rg/providers/Microsoft.ManagedIdentity/userAssignedIdentities/myId" that should have been assigned to the search service. |
SearchResourceEncryptionKey
A customer-managed encryption key in Azure Key Vault. Keys that you create and manage can be used to encrypt or decrypt data-at-rest, such as indexes and synonym maps.
Name | Type | Description |
---|---|---|
accessCredentials |
Optional Azure Active Directory credentials used for accessing your Azure Key Vault. Not required if using managed identity instead. |
|
identity | SearchIndexerDataIdentity: |
An explicit managed identity to use for this encryption key. If not specified and the access credentials property is null, the system-assigned managed identity is used. On update to the resource, if the explicit identity is unspecified, it remains unchanged. If "none" is specified, the value of this property is cleared. |
keyVaultKeyName |
string |
The name of your Azure Key Vault key to be used to encrypt your data at rest. |
keyVaultKeyVersion |
string |
The version of your Azure Key Vault key to be used to encrypt your data at rest. |
keyVaultUri |
string |
The URI of your Azure Key Vault, also referred to as DNS name, that contains the key to be used to encrypt your data at rest. An example URI might be |