How to get the exact table results from Python Sdk of Azure Form Recognizer as we get in the custom model in UI?

Michael David Robinston 21 Reputation points
2022-09-30T04:24:05.367+00:00

Dear All,

I have trained a Custom Neural model in the UI version with unstructured documents to get my results in an organized way of three tables. The model works really well and I get what I want in the UI. However when i download the sample code and run the same model in Python SDK, I get the results here too. But not as a table or dataframe as I see in the UI but a huge list object like this

[DocumentField(value_type=dictionary, value={'Contract_ID': DocumentField(value_type=string, value='2412D', content=2412D, bounding_regions=[BoundingRegion(page_number=1, bounding_box=[Point(x=1.4804, y=7.0012), Point(x=1.9012, y=7.0012), Point(x=1.9012, y=7.133), Point(x=1.4804, y=7.133)])], spans=[DocumentSpan(offset=92, length=6)], confidence=None), 'Customer_Name': DocumentField(value_type=string, value='DUNE........

and goes on when I call

poller = document_analysis_client.begin_analyze_document_from_url(model_id, formUrl)
result = poller.result()

and then iterate through

for idx, document in enumerate(result.documents):
#print("--------Analyzing document #{}--------".format(idx + 1))
#print("Document has type {}".format(document.doc_type))
#print("Document has confidence {}".format(document.confidence))
#print("Document was analyzed by model with ID {}".format(result.model_id))
for name, field in document.fields.items():
field_value = field.value if field.value else field.content
print("......found field of type '{}' with value '{}' and with confidence {}".format(field.value_type, field_value, field.confidence))

I have verified that result.documents is the list object which has the exact output values of my model. How could I iterate through this and get what I require in a dataframe. For example in the above result, ContractID was one of my column names and 2412D is its row value. I would like to have this as a dataframe. (Request to Microsoft : It would have been much better if we get what we exactly see in the UI. In this case the same three tables I got in the UI could be provided as a dataframe in the SDK )

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,100 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Michael David Robinston 21 Reputation points
    2022-10-03T04:57:34.537+00:00

    For those who are in the same situation, we have to understand the Json output of the custom model first. The Json has many components of which the Analyze result component is the one which holds both the Form Recognizer's default layout result + tables and all our model's results. 246817-image.png

    The models results (either u create dynamic tables in the UI or key value pairs) will be sitting inside the component called documents (which will be inside Analyze results). The only minus here is, the results inside this document component are in key-value pairs and not as table objects (as they are available with the default result.table ). So you have to convert the document to list and then use list indexing and if needed write a loop to access what you want and bring out the key values as a dataframe or access the result.documents directly (before that remember to use the poller class to create your results) and use a loop to pull things out from the fields.

    So the flow is Analyze Results ---> Documents ---> fields (all your model's results will be inside this key)

    1 person found this answer helpful.

  2. romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator
    2022-09-30T10:42:34.42+00:00

    @Michael David Robinston Have you tried to check the Tables property of the result? This should provide the extracted table from your document.

    for (int i = 0; i < result.Tables.Count; i++)  
     {  
         DocumentTable table = result.Tables[i];  
         Console.WriteLine($"  Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");  
          
         foreach (DocumentTableCell cell in table.Cells)  
         {  
             Console.WriteLine($"    Cell ({cell.RowIndex}, {cell.ColumnIndex}) has kind '{cell.Kind}' and content: '{cell.Content}'.");  
         }  
     }  
    

    If you have defined a dynamic table though, it is yet to be available with the SDK. Referencing similar thread which was created recently.

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.