train model fails with 'ModelBuildError: Could not build the model: Can't find any OCR files for training.'.

Paul de Goede 0 Reputation points
2025-01-17T04:14:03.7933333+00:00

Hi,

I have uploaded about 20 pdfs to storage blob sbutilitybills, container: cojtrain. Both in UK South region.

I have labelled each of them.

I created a doc AI project in UK South.

When I click the 'Train' button I get a model which immediately fails with 'ModelBuildError: Could not build the model: Can't find any OCR files for training.'.

I have looked online and there are no articles that address this problem.

I have looked at https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/train/custom-model?view=doc-intel-4.0.0 and I meet all the obligations to be able to train a model.

I have tried both Template and Neural models - both fail with the same message.

I have followed all these steps correctly:

https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/how-to-guides/build-a-custom-model?view=doc-intel-4.0.0&source=recommendations

I have checked the storage container and all the files have labels.json associated with them.

Please can you assist with why such a basic error is occuring.

Thanks

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,908 questions
{count} votes

1 answer

Sort by: Most helpful
  1. SriLakshmi C 2,490 Reputation points Microsoft Vendor
    2025-01-17T20:21:18.08+00:00

    Hello Paul de Goede,

    Greetings and Welcome to Microsoft Q&A! Thanks for posting the question.

    I understand that the files were either not uploaded or the labels were not properly created.

    Here's a workaround:

    • After you've finished building the model, go to the appropriate storage account and select the blob container you've created.
    • You can upload all of your files by selecting the upload button at the top, as shown here. This is one method for uploading files. Following this, you can proceed to the studio, where the files can be seen. Note: Please ensure that you are in the correct storage account -> blob container folder. User's image
    • If the files are successfully uploaded, we can see two files in blob containers named filename.jpg and filename.jpg.ocr.json for each uploaded file.
    • Then, in FR studio, select the + icon and create labels for each file; the labels.json file will be created in blob containers, and the model can be trained and tested.

    Also verify these steps

    Check OCR Files by verifying that all documents in the storage container have OCR-extracted data. You can test this by manually running OCR using the Azure Document Intelligence Analyze API to ensure the files are processed correctly.

    File Naming and Structure should be validated, ensuring the documents and corresponding labels.json files are correctly paired and placed in the same folder.

    Region Consistency by verifying that the storage blob and the Document AI project are located in the same region. Any mismatch in regions can lead to access issues, so align these resources for successful training.

    I hope you understand. And, if you have any further query do let us know.

    Thank you!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.