Error when Training Custom Extraction Model via API

dacello 20 Reputation points
2024-11-18T18:20:14.2633333+00:00

Hello,

I've been working to integrate an application with the DocumentIntelligence service using the REST API. The flow we want to implement involves training custom extraction (template) models on a per project basis in our system.

The custom labeling / drawing of bounding boxes for these project specific templates will happen in an interface in our application. We then upload the custom labeled documents to a project scoped training data location in blob storage which is pointed to when we then hit the Build Model endpoint via REST API from our application (after training data is uploaded successfully).

The problem Im running into now is that we are not able to successfully build a model based on this programmatically generated and uploaded training dataset, despite it all being formatted and named in line with what gets generated by the DI studio. I can confirm that we can create a custom model via API if we point at the storage location of the data set generated within the DI studio, but not if I point at the location of our programmatically generated dataset.

This is the error we get from Azure:

Could not build the model: Can't find any valid labels for provided dataset. Generic error during processing labels for [filename]

I cant seem to find any official documentation on what the training data is supposed to look like, and all of the guides I can find seem to assume we would do labeling in DI studio. We definitely need to have labeling happen our end, so this has been frustrating.

Any help would be appreciated!

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,761 questions
{count} votes

Accepted answer
  1. kothapally Snigdha 490 Reputation points Microsoft Vendor
    2024-11-18T21:50:48.9133333+00:00

    Hi dacello

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to "Accept " the answer.

    Issue:

    Error when Training Custom Extraction Model via API.

    Solution:

    I discovered that in my labels.json files, we were passing bounding Boxes as a flat array, rather than an array of arrays. In my case there is only one box per field, which tripped me up, but we need to wrap that box in an array and now I'm able to successfully build a model.

    If you have any other questions or are still running into more issues, please let me know. Thank you again for your time and patience throughout this issue.

    Please remember to "Accept Answer" if any answer/reply helped, so that others in the community facing similar issues can easily find the solution.

    Accept answers on Microsoft Q&A | Microsoft Learn

    An accepted answer is the answer that the person who asked the question chooses as the one they think best solves their problem.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.