Azure AI Search Indexer from Blob Storage failing with 'negative Length' error despite running successfully a few days ago and no config / blob changes since then.

Question

Azure AI Search Indexer from Blob Storage failing with 'negative Length' error despite running successfully a few days ago and no config / blob changes since then.

Fred Brugmans 10

I have an indexer which indexes HTML files from Blob Storage with the following JSON:

{
  "@odata.context": "<context>",
  "@odata.etag": "\"0x8DD66325859E402\"",
  "name": "<name>",
  "description": null,
  "dataSourceName": "<datasource>",
  "skillsetName": "<skillset>",
  "targetIndexName": "<index>",
  "disabled": null,
  "schedule": null,
  "parameters": {
    "batchSize": null,
    "maxFailedItems": null,
    "maxFailedItemsPerBatch": null,
    "base64EncodeKeys": null,
    "configuration": {
      "dataToExtract": "contentAndMetadata",
      "parsingMode": "default"
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_name",
      "targetFieldName": "title",
      "mappingFunction": null
    }
  ],
  "outputFieldMappings": [],
  "cache": null,
  "encryptionKey": null
}

My skillset looks like this:

{
  "@odata.etag": "\"0x8DD6367D8C35B3A\"",
  "name": "<name>",
  "description": "Skillset to chunk documents and generate embeddings",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "name": "#1",
      "description": "Split skill to chunk documents",
      "context": "/document",
      "defaultLanguageCode": "en",
      "textSplitMode": "pages",
      "maximumPageLength": 96,
      "pageOverlapLength": 47,
      "maximumPagesToTake": 0,
      "unit": "azureOpenAITokens",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "pages"
        }
      ],
      "azureOpenAITokenizerParameters": {
        "encoderModelName": "cl100k_base",
        "allowedSpecialTokens": []
      }
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "#2",
      "context": "/document/pages/*",
      "resourceUri": "<deployment>",
      "apiKey": "<redacted>",
      "deploymentId": "text-embedding-3-large",
      "dimensions": 3072,
      "modelName": "text-embedding-3-large",
      "inputs": [
        {
          "name": "text",
          "source": "/document/pages/*",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "text_vector"
        }
      ]
    }
  ],
  "indexProjections": {
    "selectors": [
      {
        "targetIndexName": "<index>",
        "parentKeyFieldName": "parent_id",
        "sourceContext": "/document/pages/*",
        "mappings": [
          {
            "name": "text_vector",
            "source": "/document/pages/*/text_vector",
            "inputs": []
          },
          {
            "name": "chunk",
            "source": "/document/pages/*",
            "inputs": []
          },
          {
            "name": "title",
            "source": "/document/title",
            "inputs": []
          },
          {
            "name": "Url",
            "source": "/document/Url",
            "inputs": []
          },
          {
            "name": "CourseAzureFileName",
            "source": "/document/CourseAzureFileName",
            "inputs": []
          },
          {
            "name": "CourseCategoryId",
            "source": "/document/CourseCategoryId",
            "inputs": []
          },
          {
            "name": "Level",
            "source": "/document/Level",
            "inputs": []
          },
          {
            "name": "Type",
            "source": "/document/Type",
            "inputs": []
          },
          {
            "name": "IsAccess",
            "source": "/document/IsAccess",
            "inputs": []
          },
          {
            "name": "IsOnline",
            "source": "/document/IsOnline",
            "inputs": []
          }
        ]
      }
    ],
    "parameters": {
      "projectionMode": "skipIndexingParentDocuments"
    }
  }
}

I have not changed the indexer or skillset since last week which had a successful run. Today I ran the indexer again without changing any of the files in blob storage or any config in the indexer/skillset/datasource/index, and it failed on one file after succeeding 60 with the following error:

length ('-163') must be a non-negative value. (Parameter 'length'). Actual value was -163.

What is the 'length' being referred to here and why is it suddenly erroring? I tried reverting the failing file to an older snapshot, it then passed for that file but still failed on another.

At first I thought this might be due to the automatic removal of tags in the HTML that happens during indexing, but I've since tried preprocessing the html by just taking the innerText and replacing all my files in blob storage with just the innerText. I still get the same error although on different files.

Is this a bug or has something changed in the way indexing occurs in the last few days. Any help would be greatly appreciated.

Bhargavi Naragani 6,055 Reputation points Microsoft External Staff Moderator

2025-03-18T20:42:12.4566667+00:00

Hi @Fred Brugmans,

The error message "length ('-163') must be a non-negative value" indicates that the indexer is trying to process a segment of data with a negative length, which is not allowed. This is strange, particularly because your configuration was working fine just a couple of days ago without any modifications on your part.

Azure Cognitive Search does have some limitations on document sizes. For example, if a document is too big, it could lead to indexing errors. Although you've stated that the documents haven't been changed, it's possible that the total size or certain properties of some documents are causing this error. Make sure individual documents don't exceed the size restrictions placed by Azure Cognitive Search. If they do, split them into smaller segments prior to indexing.

There are limits on term lengths to be processed. Scan the content of your documents to make sure no term is larger than the maximum length limit. Preprocess the content if needed to truncate or split long terms.

Adjust the maxFailedItems and maxFailedItemsPerBatch parameters to enable the indexer to bypass problematic documents without completely failing. This can be especially helpful if only a few of documents are causing issue.

Enable full logging on the indexer to determine which documents, specifically, are producing the error. This can offer clues about whether the problem lies in the document's content, size, or format.

Kindly refer to below documentations for better understanding:
https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity
https://stackoverflow.com/questions/38311911/content-too-large-when-indexing-blob-content-for-azure-search?
https://learn.microsoft.com/en-us/azure/search/cognitive-search-common-errors-warnings

Let me know if you have any further Queries.
Aditya Gupta 5 Reputation points

2025-03-24T08:02:03.25+00:00

Hi
for your #Microsoft.Skills.Text.SplitSkill skill
try different "maximumPageLength" and "pageOverlapLength" values.

"maximumPageLength": 1800, "pageOverlapLength": 300. worked for me.

3 answers

Your answer

Bhargavi Naragani 6,055 Reputation points Microsoft External Staff Moderator

2025-03-18T20:42:12.4566667+00:00

Hi @Fred Brugmans,

The error message "length ('-163') must be a non-negative value" indicates that the indexer is trying to process a segment of data with a negative length, which is not allowed. This is strange, particularly because your configuration was working fine just a couple of days ago without any modifications on your part.

Azure Cognitive Search does have some limitations on document sizes. For example, if a document is too big, it could lead to indexing errors. Although you've stated that the documents haven't been changed, it's possible that the total size or certain properties of some documents are causing this error. Make sure individual documents don't exceed the size restrictions placed by Azure Cognitive Search. If they do, split them into smaller segments prior to indexing.

There are limits on term lengths to be processed. Scan the content of your documents to make sure no term is larger than the maximum length limit. Preprocess the content if needed to truncate or split long terms.

Adjust the maxFailedItems and maxFailedItemsPerBatch parameters to enable the indexer to bypass problematic documents without completely failing. This can be especially helpful if only a few of documents are causing issue.

Enable full logging on the indexer to determine which documents, specifically, are producing the error. This can offer clues about whether the problem lies in the document's content, size, or format.

Kindly refer to below documentations for better understanding:
https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity
https://stackoverflow.com/questions/38311911/content-too-large-when-indexing-blob-content-for-azure-search?
https://learn.microsoft.com/en-us/azure/search/cognitive-search-common-errors-warnings

Let me know if you have any further Queries.
Aditya Gupta 5 Reputation points

2025-03-24T08:02:03.25+00:00

Hi
for your #Microsoft.Skills.Text.SplitSkill skill
try different "maximumPageLength" and "pageOverlapLength" values.

"maximumPageLength": 1800, "pageOverlapLength": 300. worked for me.

Answer 1

Kent Johnson 5

I'm running into the same issue! It first occurred around March 15th. I have opened a support request (2503180040013121) about it. I'll reply here if/when there's a resolution.

Fred Brugmans 10 Reputation points

2025-03-25T09:16:54.3933333+00:00

@Kent Johnson @Sai Krishna Jaligama I have received an update from Support. They say they have identified a bug and have given an ETA for the fix of the end of this week, if not sooner.
Fred Brugmans 10 Reputation points

2025-03-28T12:48:42.37+00:00

This bug is now fixed for me, so hopefully it should start working for you too.

Answer 2

Fred Brugmans 10

Hi @Bhargavi Naragani , thanks for your reply. The documents I'm working with are very short, the one it is failing on is less than 1000 characters, so I don't think that's the problem. I have tried changing the chunk size and that affects which document fails, but it does not stop them failing altogether. Also increasing the chunk size seems to result in a more negative number given as the length, e.g. -1401 for chunks of 512 token length. I've tried creating a fresh index using the UI on Azure portal, and this also fails. Has something changed in the background since the 14th March? The error message makes no sense, how can a length be negative and with such a specific number?

Sai Krishna Jaligama 0 Reputation points

2025-03-19T16:21:43.28+00:00

Hi. I am also facing the issue with my Azure AI Search Service. It was working fine yesterday. Now I am getting the error as length ('-2') must be a non-negative value. (Parameter 'length') Actual value was -2. Please let me know if it gets resolved for you.
Sai Krishna Jaligama 0 Reputation points

2025-03-19T16:26:19.9633333+00:00

My Observation is , this issue is observed for pptx files and I am able to run PDF files successfully.
Kent Johnson 5 Reputation points

2025-03-19T16:27:57.68+00:00

All my content is Markdown and I see these same errors.
Fred Brugmans 10 Reputation points

2025-03-19T16:31:02.67+00:00

I've lodged a proper support ticket and support are actively looking into this for me now. I'll comment back here for you guys when/if I find the solution or reason this is happening.

Answer 3

Kent Johnson 5

I've definitely noticed that certain Markdown files consistently trigger it, though I haven't noticed a pattern on what about those files triggers it.

Kent Johnson 5 Reputation points

2025-03-24T16:42:20.0933333+00:00

Actually, the 2 documents I looked at have very long (9000+ characters) lines in them. Perhaps there's some new line length limitation?

Share via

Azure AI Search Indexer from Blob Storage failing with 'negative Length' error despite running successfully a few days ago and no config / blob changes since then.

3 answers

Your answer