An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
Hello rke ,
Welcome to Microsoft Q&A. Thank you for reaching out with detailed case description.
It is understood that model got stuck while training and still shows the status as “running” after a day.
Hope the model is working fine now.
In addition to the suggestions provided by Jerald Felix , please let me know if the following are of any help.
As asked if it might be because of having too many regions/Fields , its typically must not be the case. However, extremely complex labelling (hundreds of overlapping regions, inconsistent tagging, or very large PDFs) can slow training — but not normally freeze it for 24+ hours. Though it takes longer time to finish , that would not stretch for longer than a day.
While there's no strict “too many fields” failure limit, performance may degrade if hundreds of fields are defined or highly dense region tagging is used. Complex table structures labelled incorrectly can also be the reason.
Since it has been stuck for over a day, it is very likely a backend training job failure. So, kindly retry by
- Deleting the stuck model.
- Recreating the model with a new Model ID and the same dataset
- Then retraining the model with less complexity, for example say 5 docs, 5 fields.
- If successful, then please gradually increase complexity.
As asked, saving under a new name will not resume the stuck training. It must be retrained.
Document Intelligence does not support cancelling a training job once started through the portal /UI. It is a Recommended approach to delete the model using REST API as
DELETE /documentModels/{modelId}
The following resource can be a useful.
Document Models - Delete Model - REST API (Azure Azure AI Services) | Microsoft Learn
The following can be the contributing factors for this situation.
As it's stuck that long, it is most likely that it is
- A failed backend training job
- Storage access issue
- Quota exhaustion
- Regional service issue
To confirm that it is not a model issue alone, kindly check Azure Service Health in the Azure Portal to confirm there are no outages affecting Azure AI Document Intelligence in the region where the resource is deployed.
While checking for Region-Specific Latency, if the issue persists:
- Try creating a new Document Intelligence resource in another supported region.
- Train the same dataset there.
- Compare results.
To check if it's because of quota exhaustion, in document Intelligence, kindly check the quotas to verify the training job quota for the subscription, number of custom models already deployed and whether training concurrency limits were exceeded.
The following reference can be a helpful read on service quotas and limits
Service quotas and limits - Document Intelligence - Foundry Tools | Microsoft Learn
Please refer to the below resource to have a headstart
Build and train a custom model - Document Intelligence - Foundry Tools | Microsoft Learn
For further extended understanding, please refer to
Please let me know if you have any questions.
Thank you!