Hi KYOUNGTAEK OH,
Thanks for reporting this. You’re not alone intermittent “Internal Server Error” issues during custom model training in Document Intelligence Studio have been observed before, even for users who’ve successfully trained many models in the past. In most cases, the issue isn’t the dataset itself but a temporary backend or configuration condition.
Here are the troubleshooting steps that usually resolve this:
- Validate the training files first (important)
Even if the same files worked earlier, run all training documents through the Layout model in Studio. If any document fails layout analysis, custom training can return an Internal Server Error instead of a validation message. This is one of the most common hidden causes.
- Check service limits and quotas
In the Azure Portal, open your Document Intelligence resource → Usage + quotas and confirm:
- You haven’t hit the concurrent training job limit
- Your dataset size and page count are within documented limits for your model type Quota exhaustion can surface as a generic server error rather than a clear message in Studio. [learn.microsoft.com]
- Retry after a short pause or switch regions
Several users have reported this error during periods of regional backend load. If possible:
- Wait a bit and retry
- Or temporarily create a new Document Intelligence resource in another region and test training there If it succeeds, that confirms a transient service-side issue rather than a data problem.
- Delete and recreate the failed model
If the model creation partially succeeded in the background, retries may keep failing.
- Delete the failed/stuck model from Studio
- Wait a few minutes
- Start a fresh training run with the same dataset This has resolved similar “internal error” cases reported by other users. [learn.microsoft.com]
- Use API logs for confirmation (optional but helpful)
If the error continues, triggering training via the REST API or SDK can return more descriptive error details than the Studio UI, which helps confirm whether the failure is data‑related or service‑side. [github.com]
If all of the above checks pass and the error persists across retries or regions, it’s very likely a service-side issue. Do let me know if you have any further queries.
If this answers your query, please do click Accept Answer and Yes for was this answer helpful.
Thankyou!