Document Intelligence Studio AI training failing

Question

Document Intelligence Studio AI training failing

George R 20

I am using Document Intelligence Studio and my SDS_Recogniser_V2 model keeps failing, despite giving it a 30-hour limit and the model only has about 200 documents and about 2000 pages. I retrained the TDS_Recogniser yesterday and that worked but the SDS_Recogniser_V2 keeps failing and the error message is generic "InternalServerError: An unexpected error occurred." Can you tell me what is wrong?
User's image

Answer accepted by question author

1 additional answer

Your answer

Answer 1

Hello George R,

Welcome to Microsoft Q&A and Thank you for sharing the additional details and the screenshot.

Based on the behavior you’re seeing where SDS_Recogniser_V2 fails repeatedly with “InternalServerError: An unexpected error occurred”, while other models like TDS_Recogniser train without issues this points to a problem that is specific to the SDS_Recogniser_V2 model version or its training dataset, rather than anything related to training limits or document volume.

Below is a consolidated breakdown of what’s going on and how you can proceed.

What we can confirm from your screenshot

TDS_Recogniser trained successfully on Nov 17.

SDS_Recogniser_V3 is currently running.

Multiple SDS_Recogniser_V2 training attempts failed instantly (within seconds/minutes).

This indicates the failure is not caused by:

30-hour timeout

Number of documents/pages

API version

Region or performance issues

The failure occurs before the training pipeline even begins, which strongly suggests an issue with the dataset or the backend state of that specific model version.

Most likely causes of repeated early failures

Based on similar cases, the most common reasons are

1. A corrupted or unsupported document in the SDS V2 dataset

Even a single file that is:

partially corrupted
password-protected
malformed PDF
extremely large
unreadable OCR image

can cause an instant generic error with no detailed message.

2. Label schema mismatches

If, some fields were renamed/removed between versions, or

certain documents have missing labels that are not marked optional

the training pipeline will fail during validation.

3. Model version metadata corruption

Sometimes a specific version (V2) becomes “stuck” on the backend. This kind of corruption explains why:

V2 fails repeatedly
V3, a new version, is able to run successfully

4. A backend service issue

InternalServerError is a generic fallback when the training service can’t generate a detailed error message. This can happen if there is a pipeline failure in the ingestion/indexing stage.

Troubleshooting Steps:

1. Continue with SDS_Recogniser_V3

V3 is running, which suggests V2’s metadata or internal state is corrupted. If V3 succeeds, that confirms the issue is isolated to SDS_Recogniser_V2.

2. Validate the training dataset

Please check that:

No PDF/image is password-protected
No file is 0 bytes
No file has unusually large or damaged pages
All labels exist across every sample (or are marked optional)
All samples use the same schema version

Also Try isolating documents added recently or ones known to have quality issues.

The repeated SDS_Recogniser_V2 failures are not caused by training limits or the number of documents. They are most likely due to:

A corrupted file

A schema mismatch

Or corruption in the V2 model version metadata on the backend

Since SDS_Recogniser_V3 is training successfully, I recommend continuing with that version while we validate the dataset.

I Hope this helps. Do let me know if you have any further queries.

Thank you!

Answer 2

George R 20

I will wait and see if SDS_Recogniser_V3 works. But are you able to tell me why the old version fail? Cause if it is because of a bad file I need to know which file it is since there are 100+ files in there.

SRILAKSHMI C 10,640 Reputation points Microsoft External Staff Moderator

2025-11-20T18:20:51.6066667+00:00

Hi George R,

Thank you for the follow-up. I completely understand the need to identify the exact file if the failure is caused by a corrupted or unsupported document.

At this time, the SDS_Recogniser_V2 training logs do not provide file-level error details for early-stage failures. When the error occurs before the ingestion/validation pipeline starts, the system returns a generic InternalServerError without pointing to a specific document. This is why we’re unable to directly see which file caused the issue.

However, here is what we can do next:

What you can try to isolate the problematic file

Since V3 has started training successfully, this strongly suggests that the V2 model state itself was corrupted. But if V3 also fails, then a dataset issue becomes the likely cause. In that case you can:

Create a duplicate dataset Copy the SDS V2 dataset into a new project/folder.

Split the documents into smaller batches (e.g., 20–30 files at a time). This helps reveal which batch triggers the error.

Train a temporary model on each batch The batch that fails will tell us the problematic subset.

Narrow down within that subset Continue dividing until you identify the exact file.

This is the fastest way to pinpoint a single corrupted or incompatible document when the backend log does not provide specific file names.

Why V2 fails but V3 may work

There are two possibilities:

Dataset issue

If V3 also fails, then a damaged PDF or schema mismatch is the cause.

Backend corruption specific to SDS_Recogniser_V2

This is surprisingly common when an older version gets into an invalid state. In such cases:

V2 will always fail immediately

Any new version (like V3) works normally

This is why continuing with V3 is the recommended path.

Thank you!
George R 20 Reputation points

2025-11-21T09:36:05.66+00:00

The V3 works now. Thanks!
SRILAKSHMI C 10,640 Reputation points Microsoft External Staff Moderator

2025-11-21T16:55:17.8866667+00:00

Hi George R,

Glad to hear it’s working now! Thanks for confirming the solution. Since I’ve converted my earlier comment into an answer, could you please take a moment to mark it as Accepted? This helps others in the community with the same question find the solution more easily.

Thank you!

Share via

Document Intelligence Studio AI training failing

1 additional answer

Your answer