Troubleshooting common indexer errors and warnings in Azure Cognitive Search

This article provides information and solutions to common errors and warnings you might encounter during indexing and AI enrichment in Azure Cognitive Search.

Indexing stops when the error count exceeds 'maxFailedItems'.

If you want indexers to ignore these errors (and skip over "failed documents"), consider updating the maxFailedItems and maxFailedItemsPerBatch as described here.

Note

Each failed document along with its document key (when available) will show up as an error in the indexer execution status. You can utilize the index api to manually upload the documents at a later point if you have set the indexer to tolerate failures.

The error information in this article can help you resolve errors, allowing indexing to continue.

Warnings don't stop indexing, but they do indicate conditions that could result in unexpected outcomes. Whether you take action or not depends on the data and your scenario.

Beginning with API version 2019-05-06, item-level Indexer errors and warnings are structured to provide increased clarity around causes and next steps. They contain the following properties:

Property Description Example
Key The document ID of the document impacted by the error or warning. https://<storageaccount>.blob.core.windows.net/jfk-1k/docid-32112954.pdf
Name The operation name describing where the error or warning occurred. This is generated by the following structure: [category].[subcategory].[resourceType].[resourceName] DocumentExtraction.azureblob.myBlobContainerName Enrichment.WebApiSkill.mySkillName Projection.SearchIndex.OutputFieldMapping.myOutputFieldName Projection.SearchIndex.MergeOrUpload.myIndexName Projection.KnowledgeStore.Table.myTableName
Message A high-level description of the error or warning. Could not execute skill because the Web Api request failed.
Details Any additional details which may be helpful to diagnose the issue, such as the WebApi response if executing a custom skill failed. link-cryptonyms-list - Error processing the request record : System.ArgumentNullException: Value cannot be null. Parameter name: source at System.Linq.Enumerable.All[TSource](IEnumerable 1 source, Func 2 predicate) at Microsoft.CognitiveSearch.WebApiSkills.JfkWebApiSkills. ...rest of stack trace...
DocumentationLink A link to relevant documentation with detailed information to debug and resolve the issue. This link will often point to one of the below sections on this page. https://go.microsoft.com/fwlink/?linkid=2106475

Error: Could not read document

Indexer was unable to read the document from the data source. This can happen due to:

Reason Details/Example Resolution
Inconsistent field types across different documents Type of value has a mismatch with column type. Couldn't store '{47.6,-122.1}' in authors column. Expected type is JArray. Error converting data type nvarchar to float. Conversion failed when converting the nvarchar value '12 months' to data type int. Arithmetic overflow error converting expression to data type int. Ensure that the type of each field is the same across different documents. For example, if the first document 'startTime' field is a DateTime, and in the second document it's a string, this error will be hit.
Errors from the data source's underlying service From Azure Cosmos DB: {"Errors":["Request rate is large"]} Check your storage instance to ensure it's healthy. You might need to adjust your scaling or partitioning.
Transient issues A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host Occasionally there are unexpected connectivity issues. Try running the document through your indexer again later.

Error: Could not extract content or metadata from your document

Indexer with a Blob data source was unable to extract the content or metadata from the document (for example, a PDF file). This can happen due to:

Reason Details/Example Resolution
Blob is over the size limit Document is '150441598' bytes, which exceeds the maximum size '134217728' bytes for document extraction for your current service tier. Blob indexing errors
Blob has unsupported content type Document has unsupported content type 'image/png' Blob indexing errors
Blob is encrypted Document could not be processed - it may be encrypted or password protected. You can skip the blob with blob settings.
Transient issues Error processing blob: The request was aborted: The request was canceled. Document timed out during processing. Occasionally there are unexpected connectivity issues. Try running the document through your indexer again later.

Error: Could not parse document

Indexer read the document from the data source, but there was an issue converting the document content into the specified field mapping schema. This can happen due to:

Reason Details/Example Resolution
The document key is missing Document key cannot be missing or empty Ensure all documents have valid document keys. The document key is determined by setting the 'key' property as part of the index definition. Indexers will emit this error when the property flagged as the 'key' cannot be found on a particular document.
The document key is invalid Document key cannot be longer than 1024 characters Modify the document key to meet the validation requirements.
Could not apply field mapping to a field Could not apply mapping function 'functionName' to field 'fieldName'. Array cannot be null. Parameter name: bytes Double check the field mappings defined on the indexer, and compare with the data of the specified field of the failed document. It may be necessary to modify the field mappings or the document data.
Could not read field value Could not read the value of column 'fieldName' at index 'fieldIndex'. A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) These errors are typically due to unexpected connectivity issues with the data source's underlying service. Try running the document through your indexer again later.

Error: Could not map output field 'xyz' to search index due to deserialization problem while applying mapping function 'abc'

The output mapping might have failed because the output data is in the wrong format for the mapping function you're using. For example, applying Base64Encode mapping function on binary data would generate this error. To resolve the issue, either rerun indexer without specifying mapping function or ensure that the mapping function is compatible with the output field data type. See Output field mapping for details.

Error: Could not execute skill

The indexer was not able to run a skill in the skillset.

Reason Details/Example Resolution
Transient connectivity issues A transient error occurred. Please try again later. Occasionally there are unexpected connectivity issues. Try running the document through your indexer again later.
Potential product bug An unexpected error occurred. This indicates an unknown class of failure and may indicate a product bug. File a support ticket to get help.
A skill has encountered an error during execution (From Merge Skill) One or more offset values were invalid and could not be parsed. Items were inserted at the end of the text Use the information in the error message to fix the issue. This kind of failure will require action to resolve.

Error: Could not execute skill because the Web API request failed

The skill execution failed because the call to the Web API failed. Typically, this class of failure occurs when custom skills are used, in which case you'll need to debug your custom code to resolve the issue. If instead the failure is from a built-in skill, refer to the error message for help in fixing the issue.

While debugging this issue, be sure to pay attention to any skill input warnings for this skill. Your Web API endpoint may be failing because the indexer is passing it unexpected input.

Error: Could not execute skill because Web API skill response is invalid

The skill execution failed because the call to the Web API returned an invalid response. Typically, this class of failure occurs when custom skills are used, in which case you'll need to debug your custom code to resolve the issue. If instead the failure is from a built-in skill, file a support ticket to get assistance.

Error: Type of value has a mismatch with column type. Couldn't store in 'xyz' column. Expected type is 'abc'

If your data source has a field with a different data type than the field you're trying to map in your index, you may encounter this error. Check your data source field data types and make sure they are mapped correctly to your index data types.

Error: Skill did not execute within the time limit

There are two cases under which you may encounter this error message, each of which should be treated differently. Follow the instructions below depending on what skill returned this error for you.

Built-in Cognitive Service skills

Many of the built-in cognitive skills, such as language detection, entity recognition, or OCR, are backed by a Cognitive Service API endpoint. Sometimes there are transient issues with these endpoints and a request will time out. For transient issues, there is no remedy except to wait and try again. As a mitigation, consider setting your indexer to run on a schedule. Scheduled indexing picks up where it left off. Assuming transient issues are resolved, indexing and cognitive skill processing should be able to continue on the next scheduled run.

If you continue to see this error on the same document for a built-in cognitive skill, file a support ticket to get assistance, as this isn't expected.

Custom skills

If you encounter a timeout error with a custom skill, there are a couple of things you can try. First, review your custom skill and ensure that it's not getting stuck in an infinite loop and that it's returning a result consistently. Once you have confirmed that a result is returned, check the duration of execution. If you didn't explicitly set a timeout value on your custom skill definition, then the default timeout is 30 seconds. If 30 seconds isn't long enough for your skill to execute, you may specify a higher timeout value on your custom skill definition. Here's an example of a custom skill definition where the timeout is set to 90 seconds:

  {
        "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
        "uri": "<your custom skill uri>",
        "batchSize": 1,
        "timeout": "PT90S",
        "context": "/document",
        "inputs": [
          {
            "name": "input",
            "source": "/document/content"
          }
        ],
        "outputs": [
          {
            "name": "output",
            "targetName": "output"
          }
        ]
      }

The maximum value that you can set for the timeout parameter is 230 seconds. If your custom skill is unable to execute consistently within 230 seconds, you may consider reducing the batchSize of your custom skill so that it will have fewer documents to process within a single execution. If you have already set your batchSize to 1, you'll need to rewrite the skill to be able to execute in under 230 seconds or otherwise split it into multiple custom skills so that the execution time for any single custom skill is a maximum of 230 seconds. Review the custom skill documentation for more information.

Error: Could not 'MergeOrUpload' | 'Delete' document to the search index

The document was read and processed, but the indexer could not add it to the search index. This can happen due to:

Reason Details/Example Resolution
A field contains a term that is too large A term in your document is larger than the 32 KB limit You can avoid this restriction by ensuring the field isn't configured as filterable, facetable, or sortable.
Document is too large to be indexed A document is larger than the maximum api request size How to index large data sets
Document contains too many objects in collection A collection in your document exceeds the maximum elements across all complex collections limit. The document with key '1000052' has '4303' objects in collections (JSON arrays). At most '3000' objects are allowed to be in collections across the entire document. Remove objects from collections and try indexing the document again. We recommend reducing the size of the complex collection in the document to below the limit and avoid high storage utilization.
Trouble connecting to the target index (that persists after retries) because the service is under other load, such as querying or indexing. Failed to establish connection to update index. Search service is under heavy load. Scale up your search service
Search service is being patched for service update, or is in the middle of a topology reconfiguration. Failed to establish connection to update index. Search service is currently down/Search service is undergoing a transition. Configure service with at least 3 replicas for 99.9% availability per SLA documentation
Failure in the underlying compute/networking resource (rare) Failed to establish connection to update index. An unknown failure occurred. Configure indexers to run on a schedule to pick up from a failed state.
An indexing request made to the target index was not acknowledged within a timeout period due to network issues. Could not establish connection to the search index in a timely manner. Configure indexers to run on a schedule to pick up from a failed state. Additionally, try lowering the indexer batch size if this error condition persists.

Error: Could not index document because some of the document's data was not valid

The document was read and processed by the indexer, but due to a mismatch in the configuration of the index fields and the data extracted and processed by the indexer, it could not be added to the search index. This can happen due to:

Reason Details/Example
Data type of the field(s) extracted by the indexer is incompatible with the data model of the corresponding target index field. The data field '_data_' in the document with key '888' has an invalid value 'of type 'Edm.String''. The expected type was 'Collection(Edm.String)'.
Failed to extract any JSON entity from a string value. Could not parse value 'of type 'Edm.String'' of field '_data_' as a JSON object. Error:'After parsing a value an unexpected character was encountered: ''. Path '_path_', line 1, position 3162.'
Failed to extract a collection of JSON entities from a string value. Could not parse value 'of type 'Edm.String'' of field '_data_' as a JSON array. Error:'After parsing a value an unexpected character was encountered: ''. Path '[0]', line 1, position 27.'
An unknown type was discovered in the source document. Unknown type '_unknown_' cannot be indexed
An incompatible notation for geography points was used in the source document. WKT POINT string literals are not supported. Use GeoJson point literals instead

In all these cases, refer to Supported Data types and Data type map for indexers to make sure that you build the index schema correctly and have set up appropriate indexer field mappings. The error message will include details that can help track down the source of the mismatch.

Error: Integrated change tracking policy cannot be used because table has a composite primary key

This applies to SQL tables, and usually happens when the key is either defined as a composite key or, when the table has defined a unique clustered index (as in a SQL index, not an Azure Search index). The main reason is that the key attribute is modified to be a composite primary key in the case of a unique clustered index. In that case, make sure that your SQL table doesn't have a unique clustered index, or that you map the key field to a field that is guaranteed not to have duplicate values.

Error: Could not process document within indexer max run time

This error occurs when the indexer is unable to finish processing a single document from the data source within the allowed execution time. Maximum running time is shorter when skillsets are used. When this error occurs, if you have maxFailedItems set to a value other than 0, the indexer bypasses the document on future runs so that indexing can progress. If you can't afford to skip any document, or if you're seeing this error consistently, consider breaking documents into smaller documents so that partial progress can be made within a single indexer execution.

Error: Could not project document

This error occurs when the indexer is attempting to project data into a knowledge store and there was a failure on the attempt. This failure could be consistent and fixable, or it could be a transient failure with the projection output sink that you may need to wait and retry in order to resolve. Here's a set of known failure states and possible resolutions.

Reason Details/Example Resolution
Could not update projection blob 'blobUri' in container 'containerName' The specified container doesn't exist. The indexer will check if the specified container has been previously created and will create it if necessary, but it only performs this check once per indexer run. This error means that something deleted the container after this step. To resolve this error, try this: leave your storage account information alone, wait for the indexer to finish, and then rerun the indexer.
Could not update projection blob 'blobUri' in container 'containerName' Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host. This is expected to be a transient failure with Azure Storage and thus should be resolved by rerunning the indexer. If you encounter this error consistently, file a support ticket so it can be investigated further.
Could not update row 'projectionRow' in table 'tableName' The server is busy. This is expected to be a transient failure with Azure Storage and thus should be resolved by rerunning the indexer. If you encounter this error consistently, file a support ticket so it can be investigated further.

Error: The cognitive service for skill '<skill-name>' has been throttled

Skill execution failed because the call to Cognitive Services was throttled. Typically, this class of failure occurs when too many skills are executing in parallel. If you're using the Microsoft.Search.Documents client library to run the indexer, you can use the SearchIndexingBufferedSender to get automatic retry on failed steps. Otherwise, you can reset and rerun the indexer.

Warning: Skill input was invalid

An input to the skill was missing, it has the wrong type, or otherwise, invalid. The warning message will indicate the impact:

  1. Could not execute skill

1.Skill executed but may have unexpected results

Cognitive skills have required inputs and optional inputs. For example, the Key phrase extraction skill has two required inputs text, languageCode, and no optional inputs. Custom skill inputs are all considered optional inputs.

If required inputs are missing or if the input isn't the right type, the skill gets skipped and generates a warning. Skipped skills don't generate outputs. If downstream skills consume the outputs of the skipped skill, they may generate additional warnings.

If an optional input is missing, the skill still runs but may produce unexpected output due to the missing input.

In both cases, this warning may be expected due to the shape of your data. For example, if you have a document containing information about people with the fields firstName, middleName, and lastName, you may have some documents which don't have an entry for middleName. If you pass middleName as an input to a skill in the pipeline, then it's expected that this skill input may be missing some of the time. You will need to evaluate your data and scenario to determine whether or not any action is required as a result of this warning.

If you want to provide a default value in case of missing input, you can use the Conditional skill to generate a default value and then use the output of the Conditional skill as the skill input.

{
    "@odata.type": "#Microsoft.Skills.Util.ConditionalSkill",
    "context": "/document",
    "inputs": [
        { "name": "condition", "source": "= $(/document/language) == null" },
        { "name": "whenTrue", "source": "= 'en'" },
        { "name": "whenFalse", "source": "= $(/document/language)" }
    ],
    "outputs": [ { "name": "output", "targetName": "languageWithDefault" } ]
}
Reason Details/Example Resolution
Skill input is the wrong type "Required skill input was not of the expected type String. Name: text, Source: /document/merged_content." "Required skill input was not of the expected format. Name: text, Source: /document/merged_content." "Cannot iterate over non-array /document/normalized_images/0/imageCelebrities/0/detail/celebrities." "Unable to select 0 in non-array /document/normalized_images/0/imageCelebrities/0/detail/celebrities" Certain skills expect inputs of particular types, for example Sentiment skill expects text to be a string. If the input specifies a non-string value, then the skill doesn't execute and generates no outputs. Ensure your data set has input values uniform in type, or use a Custom Web API skill to preprocess the input. If you're iterating the skill over an array, check the skill context and input have * in the correct positions. Usually both the context and input source should end with * for arrays.
Skill input is missing Required skill input is missing. Name: text, Source: /document/merged_content Missing value /document/normalized_images/0/imageTags. Unable to select 0 in array /document/pages of length 0. If this warning occurs for all documents, there could be a typo in the input paths. Check the property name casing. Check for an extra or missing * in the path. Verify that the documents from the data source provide the required inputs.
Skill language code input is invalid Skill input languageCode has the following language codes X,Y,Z, at least one of which is invalid. See more details below.

Warning: Skill input 'languageCode' has the following language codes 'X,Y,Z', at least one of which is invalid.

One or more of the values passed into the optional languageCode input of a downstream skill isn't supported. This can occur if you're passing the output of the LanguageDetectionSkill to subsequent skills, and the output consists of more languages than are supported in those downstream skills.

Note that you may also get a warning similar to this one if an invalid countryHint input gets passed to the LanguageDetectionSkill. If that happens, validate that the field you're using from your data source for that input contains valid ISO 3166-1 alpha-2 two letter country codes. If some are valid and some are invalid, continue with the following guidance but replace languageCode with countryHint and defaultLanguageCode with defaultCountryHint to match your use case.

If you know that your data set is all in one language, you should remove the LanguageDetectionSkill and the languageCode skill input and use the defaultLanguageCode skill parameter for that skill instead, assuming the language is supported for that skill.

If you know that your data set contains multiple languages and thus you need the LanguageDetectionSkill and languageCode input, consider adding a ConditionalSkill to filter out the text with languages that are not supported before passing in the text to the downstream skill. Here's an example of what this might look like for the EntityRecognitionSkill:

{
    "@odata.type": "#Microsoft.Skills.Util.ConditionalSkill",
    "context": "/document",
    "inputs": [
        { "name": "condition", "source": "= $(/document/language) == 'de' || $(/document/language) == 'en' || $(/document/language) == 'es' || $(/document/language) == 'fr' || $(/document/language) == 'it'" },
        { "name": "whenTrue", "source": "/document/content" },
        { "name": "whenFalse", "source": "= null" }
    ],
    "outputs": [ { "name": "output", "targetName": "supportedByEntityRecognitionSkill" } ]
}

Here are some references for the currently supported languages for each of the skills that may produce this error message:

Warning: Skill input was truncated

Cognitive skills limit the length of text that can be analyzed at one time. If the text input exceeds the limit, the text is truncated before it's enriched. The skill executes, but not over all of your data.

In the example LanguageDetectionSkill below, the 'text' input field might trigger this warning if the input is over the character limit. Input limits can be found in the skills reference documentation.

 {
    "@odata.type": "#Microsoft.Skills.Text.LanguageDetectionSkill",
    "inputs": [
      {
        "name": "text",
        "source": "/document/text"
      }
    ],
    "outputs": [...]
  }

If you want to ensure that all text is analyzed, consider using the Split skill.

Warning: Web API skill response contains warnings

The indexer ran the skill in the skillset, but the response from the Web API request indicates there are warnings. Review the warnings to understand how your data is impacted and whether further action is required.

Warning: The current indexer configuration does not support incremental progress

This warning only occurs for Azure Cosmos DB data sources.

Incremental progress during indexing ensures that if indexer execution is interrupted by transient failures or execution time limit, the indexer can pick up where it left off next time it runs, instead of having to re-index the entire collection from scratch. This is especially important when indexing large collections.

The ability to resume an unfinished indexing job is predicated on having documents ordered by the _ts column. The indexer uses the timestamp to determine which document to pick up next. If the _ts column is missing or if the indexer can't determine if a custom query is ordered by it, the indexer starts at beginning and you'll see this warning.

It's possible to override this behavior, enabling incremental progress and suppressing this warning by using the assumeOrderByHighWaterMarkColumn configuration property.

For more information, see Incremental progress and custom queries.

Warning: Some data was lost during projection. Row 'X' in table 'Y' has string property 'Z' which was too long.

The Table Storage service has limits on how large entity properties can be. Strings can have 32,000 characters or less. If a row with a string property longer than 32,000 characters is being projected, only the first 32,000 characters are preserved. To work around this issue, avoid projecting rows with string properties longer than 32,000 characters.

Warning: Truncated extracted text to X characters

Indexers limit how much text can be extracted from any one document. This limit depends on the pricing tier: 32,000 characters for Free tier, 64,000 for Basic, 4 million for Standard, 8 million for Standard S2, and 16 million for Standard S3. Text that was truncated won't be indexed. To avoid this warning, try breaking apart documents with large amounts of text into multiple, smaller documents.

For more information, see Indexer limits.

Warning: Could not map output field 'X' to search index

Output field mappings that reference non-existent/null data will produce warnings for each document and result in an empty index field. To work around this issue, double-check your output field-mapping source paths for possible typos, or set a default value using the Conditional skill. See Output field mapping for details.

Reason Details/Example Resolution
Cannot iterate over non-array "Cannot iterate over non-array /document/normalized_images/0/imageCelebrities/0/detail/celebrities." This error occurs when the output isn't an array. If you think the output should be an array, check the indicated output source field path for errors. For example, you might have a missing or extra * in the source field name. It's also possible that the input to this skill is null, resulting in an empty array. Find similar details in Skill Input was Invalid section.
Unable to select 0 in non-array "Unable to select 0 in non-array /document/pages." This could happen if the skills output doesn't produce an array and the output source field name has array index or * in its path. Double check the paths provided in the output source field names and the field value for the indicated field name. Find similar details in Skill Input was Invalid section.

Warning: The data change detection policy is configured to use key column 'X'

Data change detection policies have specific requirements for the columns they use to detect change. One of these requirements is that this column is updated every time the source item is changed. Another requirement is that the new value for this column is greater than the previous value. Key columns don't fulfill this requirement because they don't change on every update. To work around this issue, select a different column for the change detection policy.

Warning: Document text appears to be UTF-16 encoded, but is missing a byte order mark

The indexer parsing modes need to know how text is encoded before parsing it. The two most common ways of encoding text are UTF-16 and UTF-8. UTF-8 is a variable-length encoding where each character is between 1 byte and 4 bytes long. UTF-16 is a fixed-length encoding where each character is 2 bytes long. UTF-16 has two different variants, "big endian" and "little endian". Text encoding is determined by a "byte order mark", a series of bytes before the text.

Encoding Byte Order Mark
UTF-16 Big Endian 0xFE 0xFF
UTF-16 Little Endian 0xFF 0xFE
UTF-8 0xEF 0xBB 0xBF

If no byte order mark is present, the text is assumed to be encoded as UTF-8.

To work around this warning, determine what the text encoding for this blob is and add the appropriate byte order mark.

Warning: Azure Cosmos DB collection 'X' has a Lazy indexing policy. Some data may be lost

Collections with Lazy indexing policies can't be queried consistently, resulting in your indexer missing data. To work around this warning, change your indexing policy to Consistent.

Warning: The document contains very long words (longer than 64 characters). These words may result in truncated and/or unreliable model predictions.

This warning is passed from the Language service of Azure Cognitive Services. In some cases, it's safe to ignore this warning, such as when your document contains a long URL (which likely isn't a key phrase or driving sentiment, etc.). Be aware that when a word is longer than 64 characters, it will be truncated to 64 characters which can affect model predictions.