question

Titanium-5542 avatar image
0 Votes"
Titanium-5542 asked ramr-msft answered

Extract table data and put them into dictionary with azure form recognizer

I have searched related to my question but none found.

Below is my tried working code:

import json
from azure.core.exceptions import ResourceNotFoundError
from azure.ai.formrecognizer import FormRecognizerClient, FormTrainingClient
from azure.core.credentials import AzureKeyCredential

credentials = json.load(open("creds.json"))

API_KEY = credentials["API_KEY"]
ENDPOINT = credentials["ENDPOINT"]

url = "https://some_pdf_url_which_contains_tables.pdf" #or image url which contains
#table

form_recognizer_client = FormRecognizerClient(ENDPOINT, AzureKeyCredential(API_KEY))
poller = form_recognizer_client.begin_recognize_content_from_url(url)
form_data = poller.result()

for page in form_data:
for table in page.tables:
for cell in table.cells:
for item in cell.text:
print(item)
## But I need table in dictionary format with header names in keys and
## values in values. Not just plain text.

azure-form-recognizerfasttrack-azure-startup
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@Titanium-5542 Thanks for the question. You could try labeling the table as key values pairs and from JSON output you can convert to the required format.
Example:
Form Recognizer provides a JSON output, we do not yet provide a CSV output. You can convert the JSON into a CSV file as a post process step.

Following is a sample code on converting the invoice output into CSV file you can enhance it to output also the tables and more.

https://github.com/Azure-Samples/cognitive-services-quickstart-code/blob/master/python/FormRecognizer/rest/python-invoices.md#sample-python-script-to-extract-invoice-or-a-batch-of-invoices-into-a-csv-file

0 Votes 0 ·

Hi thanks for the reply, I did try it didnt work. can you please provide a working example of what your saying ? Thank you.

0 Votes 0 ·

And I cant do manual labelling for hundreds of documents. I want to try with general form recognizer and need key value pair of table data. form recognizers detects headers in a table right ? and I just want headers as keys and data under the headers as values. Please help. Atleast I need headers and data of table separately not in mixed.

0 Votes 0 ·

1 Answer

ramr-msft avatar image
0 Votes"
ramr-msft answered

@Titanium-5542 Thanks for the details. Can you please add more details about your input document and usecase.

Extract Column Header Information:
Layout supports column header recognition - The updated Layout API table feature adds header recognition with column headers that can span multiple rows. Each table cell has an attribute that indicates whether it's part of a header or not. This can be used to identify which rows make up the table header.

Please follow the document that could help.

Here is link to General document model that Analyze and extract text, tables, structure, key-value pairs, and named entities.
202751-image.png





image.png (28.0 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.