Do custom trained classifier models in Document Intelligence Studio return the text of the documents?

Question

Do custom trained classifier models in Document Intelligence Studio return the text of the documents?

Destin Hebert 45

Do custom trained classifier models in Document Intelligence Studio return the text of the documents? I see in the response that I am getting from my custom trained classifier that it is doing a wonderful job classifying my documents as its been trained but I also notice that it does not return any of the content inside my pages/documents. Is this a feature of begin_classify_document or do I need to run 2 models? My custom classifier and a prebuilt model that extracts the text (such as prebuilt-layout, etc)?

Current model version: 2023-10-31
Current Python SDK:

azure-ai-documentintelligence==1.0.0b1

0 comments

Answer accepted by question author

0 additional answers

Your answer

Answer 1

Hello @Destin Hebert , Thanks for using Microsoft Q&A Platform.

This is the current limitation we have for Custom classification model. This model combines layout and language features to accurately detect and identify documents you process within your application.

Yes, the model response contains only the identified documents with the associated page ranges in the documents section of the response. After training classifier model, if you observe the blob storage data, filename.ocr.json files will be automatically generated once documents training is done. But for custom extraction model will have .labels.json file for each file since we label the data.

My suggestion is similar to what you have mentioned, by combining Custom Classification and Custom Extraction models/pre-built models. You can create Custom Classification models, Custom Extraction/pre-built models, and post processing code to combine these for the expected results.

I hope this helps.

Regards,
Vasavi

-Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

Destin Hebert 45 Reputation points

2024-01-25T21:20:31.0366667+00:00

@VasaviLankipalle-MSFT

Thanks so much for your help. Do you have any tips on how to optimize the process of calling both the prebuilt-document model to get the extracted text from the documents and the custom classifier to get the categories of each document? I notice that it takes nearly 2 minutes to call and get a return from both models but only 1 second to post process the returned data. Thank you once again.

Share via

Do custom trained classifier models in Document Intelligence Studio return the text of the documents?

0 additional answers

Your answer