I'm running into the same issue! It first occurred around March 15th. I have opened a support request (2503180040013121) about it. I'll reply here if/when there's a resolution.
Azure AI Search Indexer from Blob Storage failing with 'negative Length' error despite running successfully a few days ago and no config / blob changes since then.
I have an indexer which indexes HTML files from Blob Storage with the following JSON:
{
"@odata.context": "<context>",
"@odata.etag": "\"0x8DD66325859E402\"",
"name": "<name>",
"description": null,
"dataSourceName": "<datasource>",
"skillsetName": "<skillset>",
"targetIndexName": "<index>",
"disabled": null,
"schedule": null,
"parameters": {
"batchSize": null,
"maxFailedItems": null,
"maxFailedItemsPerBatch": null,
"base64EncodeKeys": null,
"configuration": {
"dataToExtract": "contentAndMetadata",
"parsingMode": "default"
}
},
"fieldMappings": [
{
"sourceFieldName": "metadata_storage_name",
"targetFieldName": "title",
"mappingFunction": null
}
],
"outputFieldMappings": [],
"cache": null,
"encryptionKey": null
}
My skillset looks like this:
{
"@odata.etag": "\"0x8DD6367D8C35B3A\"",
"name": "<name>",
"description": "Skillset to chunk documents and generate embeddings",
"skills": [
{
"@odata.type": "#Microsoft.Skills.Text.SplitSkill",
"name": "#1",
"description": "Split skill to chunk documents",
"context": "/document",
"defaultLanguageCode": "en",
"textSplitMode": "pages",
"maximumPageLength": 96,
"pageOverlapLength": 47,
"maximumPagesToTake": 0,
"unit": "azureOpenAITokens",
"inputs": [
{
"name": "text",
"source": "/document/content",
"inputs": []
}
],
"outputs": [
{
"name": "textItems",
"targetName": "pages"
}
],
"azureOpenAITokenizerParameters": {
"encoderModelName": "cl100k_base",
"allowedSpecialTokens": []
}
},
{
"@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
"name": "#2",
"context": "/document/pages/*",
"resourceUri": "<deployment>",
"apiKey": "<redacted>",
"deploymentId": "text-embedding-3-large",
"dimensions": 3072,
"modelName": "text-embedding-3-large",
"inputs": [
{
"name": "text",
"source": "/document/pages/*",
"inputs": []
}
],
"outputs": [
{
"name": "embedding",
"targetName": "text_vector"
}
]
}
],
"indexProjections": {
"selectors": [
{
"targetIndexName": "<index>",
"parentKeyFieldName": "parent_id",
"sourceContext": "/document/pages/*",
"mappings": [
{
"name": "text_vector",
"source": "/document/pages/*/text_vector",
"inputs": []
},
{
"name": "chunk",
"source": "/document/pages/*",
"inputs": []
},
{
"name": "title",
"source": "/document/title",
"inputs": []
},
{
"name": "Url",
"source": "/document/Url",
"inputs": []
},
{
"name": "CourseAzureFileName",
"source": "/document/CourseAzureFileName",
"inputs": []
},
{
"name": "CourseCategoryId",
"source": "/document/CourseCategoryId",
"inputs": []
},
{
"name": "Level",
"source": "/document/Level",
"inputs": []
},
{
"name": "Type",
"source": "/document/Type",
"inputs": []
},
{
"name": "IsAccess",
"source": "/document/IsAccess",
"inputs": []
},
{
"name": "IsOnline",
"source": "/document/IsOnline",
"inputs": []
}
]
}
],
"parameters": {
"projectionMode": "skipIndexingParentDocuments"
}
}
}
I have not changed the indexer or skillset since last week which had a successful run. Today I ran the indexer again without changing any of the files in blob storage or any config in the indexer/skillset/datasource/index, and it failed on one file after succeeding 60 with the following error:
length ('-163') must be a non-negative value. (Parameter 'length'). Actual value was -163.
What is the 'length' being referred to here and why is it suddenly erroring? I tried reverting the failing file to an older snapshot, it then passed for that file but still failed on another.
At first I thought this might be due to the automatic removal of tags in the HTML that happens during indexing, but I've since tried preprocessing the html by just taking the innerText and replacing all my files in blob storage with just the innerText. I still get the same error although on different files.
Is this a bug or has something changed in the way indexing occurs in the last few days. Any help would be greatly appreciated.
Azure AI Search
3 answers
Sort by: Most helpful
-
-
Fred Brugmans 10 Reputation points
2025-03-19T10:31:47.5733333+00:00 Hi @Bhargavi Naragani , thanks for your reply. The documents I'm working with are very short, the one it is failing on is less than 1000 characters, so I don't think that's the problem. I have tried changing the chunk size and that affects which document fails, but it does not stop them failing altogether. Also increasing the chunk size seems to result in a more negative number given as the length, e.g. -1401 for chunks of 512 token length. I've tried creating a fresh index using the UI on Azure portal, and this also fails. Has something changed in the background since the 14th March? The error message makes no sense, how can a length be negative and with such a specific number?
-
Kent Johnson 5 Reputation points
2025-03-24T16:16:59.6566667+00:00 I've definitely noticed that certain Markdown files consistently trigger it, though I haven't noticed a pattern on what about those files triggers it.