Azure Form Recognizer Layout Model

Question

Azure Form Recognizer Layout Model

Anonymous

Hi !

I'm using Azure Form Recognizer Layout Model, Azure Functions and Azure Storage to extract tables and output csv files from pdf file.

I'm developing Functions following this tutorial.

https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/tutorial-azure-function?view=doc-intel-3.1.0&viewFallbackFrom=form-recog-3.0.0#create-an-azure-functions-project

However, I have problem extracting tables. User's image

There are more than 2 tables in 1 page, but only 1 table is extracted from each page.

I checked resp_json to find only 1 table is extracted from each page.
resp_json is in "Add document processing code" section, paragraph 5 of above link.

Please kindly advise me how to get all tables. User's image

Additional information are as follows:

When I try Form Recognizer Studio with the same pdf, I can get result (json file) including all tables, in a minute.
Elongating wait_sec of time.sleep(wait_sec) helps me a bit.
- This code is in "Add document processing code" section, paragraph 5.
  - wait_sec is changed from 25s to 1000s.
    - In that case, 2 tables are extracted from one of the pages, but not all tables.

From above reasons, I suppose the problem is due to Functions, not Form Recognizer.

I wonder if it's due to the execution time of Functions, altough Form Recognizer Studio doesn't take much time to extract all tables.

resp_json and wait_sec in code: User's image

Ramr-msft 17,826 Reputation points

2023-07-27T03:45:44.3433333+00:00

@Anonymous Thanks for the question, Can you please add more details about the version that you are using. We would recommend using the latest version API.
Anonymous

2023-07-27T06:18:23.1833333+00:00

@Ramr-msft

Thanks a lot for your reply.

I'm afraid I can't figure out how to check the API version.

Instead, I've checked the versions as below.

I'm using Core Tools version 4.0.5198, and it's the latest.

Function Runtime Version is 4.21.1.20667, and the latest version is 4.x.

version written in host.json file is 2.0.
Anonymous

2023-07-27T06:53:57.85+00:00
Also in host.json, version of extensionBundle is as follows:

"version": "[3.*, 4.0.0)"
Ramr-msft 17,826 Reputation points

2023-07-28T03:40:35.09+00:00

@Anonymous Thanks for the Details, We would recommend using the following API documentation, you may be using the v2.1 API.

Can you please share the post_url that you are using, Please update the post_url to the latest version as given below.

https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-2023-07-31/operations/AnalyzeDocument
Anonymous

2023-07-28T12:06:21.64+00:00
@Ramr-msft

Thanks a lot for your advice !

post_url I'm using is as follows:

post_url = endpoint + "/formrecognizer/v2.1/layout/analyze"

As you pointed out, it seems the API version I'm using is old.

I have additional questions.

Can I use the current url format if I chage "v2.1" to new version ?

Actually, changing "v2.1" to "2023-07-31" or "2022-08-31" didn't work.

If 1. is not possible, which URL should I assign to post_url ?

https://{endpoint}/formrecognizer/documentModels/{modelId}:analyze?api-version=2023-07-31[&pages][&locale][&stringIndexType][&features][&queryFields]

or

conn = httplib.HTTPSConnection('westus.api.cognitive.microsoft.com') conn.request("POST", "/formrecognizer/documentModels/{modelId}:analyze?api-version=2023-07-31&%s" % params, "{body}", headers)

Model ID

Is modelId different from managed ID or resource ID ?

I wonder how I can check it.

Is it only required in custom model and not in layout model ?

API version of Layout model in Form Recognizer Studio

I'm using layout model and its API version is 2022-08-31.

I was able to get all tables using this.

Should I set the API version to 2022-08-31 ?
Anonymous

2023-08-04T05:57:21.2133333+00:00

@Ramr-msft

Thank you very much for your advice !

I tried changing the post_url like the link and it worked.

I really appreciate your help in resolving this issue !
Ramr-msft 17,826 Reputation points

2023-08-09T04:16:37.01+00:00

@Anonymous Thanks for the update, I have converted my comment to answer, please accept the answer so that it's helpful to other community members.

Accepted answer

0 additional answers

Your answer

Ramr-msft 17,826 Reputation points

2023-07-27T03:45:44.3433333+00:00

@Anonymous Thanks for the question, Can you please add more details about the version that you are using. We would recommend using the latest version API.
Anonymous

2023-07-27T06:18:23.1833333+00:00

@Ramr-msft

Thanks a lot for your reply.

I'm afraid I can't figure out how to check the API version.

Instead, I've checked the versions as below.

I'm using Core Tools version 4.0.5198, and it's the latest.

Function Runtime Version is 4.21.1.20667, and the latest version is 4.x.

version written in host.json file is 2.0.
Anonymous

2023-07-27T06:53:57.85+00:00

Also in host.json, version of extensionBundle is as follows:

"version": "[3.*, 4.0.0)"
Ramr-msft 17,826 Reputation points

2023-07-28T03:40:35.09+00:00

@Anonymous Thanks for the Details, We would recommend using the following API documentation, you may be using the v2.1 API.

Can you please share the post_url that you are using, Please update the post_url to the latest version as given below.

https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-2023-07-31/operations/AnalyzeDocument
Anonymous

2023-07-28T12:06:21.64+00:00

@Ramr-msft

Thanks a lot for your advice !

post_url I'm using is as follows:

post_url = endpoint + "/formrecognizer/v2.1/layout/analyze"

As you pointed out, it seems the API version I'm using is old.

I have additional questions.

Can I use the current url format if I chage "v2.1" to new version ?

Actually, changing "v2.1" to "2023-07-31" or "2022-08-31" didn't work.

If 1. is not possible, which URL should I assign to post_url ?

https://{endpoint}/formrecognizer/documentModels/{modelId}:analyze?api-version=2023-07-31[&pages][&locale][&stringIndexType][&features][&queryFields]

or

conn = httplib.HTTPSConnection('westus.api.cognitive.microsoft.com') conn.request("POST", "/formrecognizer/documentModels/{modelId}:analyze?api-version=2023-07-31&%s" % params, "{body}", headers)

Model ID

Is modelId different from managed ID or resource ID ?

I wonder how I can check it.

Is it only required in custom model and not in layout model ?

API version of Layout model in Form Recognizer Studio

I'm using layout model and its API version is 2022-08-31.

I was able to get all tables using this.

Should I set the API version to 2022-08-31 ?
Anonymous

2023-08-04T05:57:21.2133333+00:00

@Ramr-msft

Thank you very much for your advice !

I tried changing the post_url like the link and it worked.

I really appreciate your help in resolving this issue !
Ramr-msft 17,826 Reputation points

2023-08-09T04:16:37.01+00:00

@Anonymous Thanks for the update, I have converted my comment to answer, please accept the answer so that it's helpful to other community members.

Answer 1

Ramr-msft 17,826

@Anonymous Thanks for the details, Model ID's is given the following url, Yes you can set the API version to the latest "2023-07-31" and endpoint details are available in the url link.

User's image

Share via

Azure Form Recognizer Layout Model

0 additional answers

Your answer