FormRecognizer Layout Model, DocumentLine whitespace configuration

Vincent Villacorta 21 Reputation points
2022-02-25T23:50:27.213+00:00

Hello!

I had a question about FormRecognizer's Layout Model when using AnalyzeResult. When Extracting DocumentLine objects (AnalyzeResult.DocumentPage.DocumentLine), it seems to denote documentline objects by words found horizontally and separate by whitespaces. Is the width of the whitespaces separating DocumentLine objects something that can be configured?

Thanks,

C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,578 questions
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,508 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 43,656 Reputation points Microsoft Employee
    2022-02-28T08:29:02.703+00:00

    @Vincent Villacorta The layout model API does not have a parameter to pass to configure the width of whitespaces. In v2.1 there is an option to pass the reading order but with v3.0 there is no option since natural reading order is used by all features.

    In Form Recognizer v2.1, you can specify the order in which the text lines are output with the readingOrder query parameter. Use natural for a more human-friendly reading order output as shown in the following example. This feature is only supported for Latin languages.

    In Form Recognizer v3.0, the natural reading order output is used by the service in all cases. Therefore, there is no readingOrder parameter provided in this version.

    If you could share the document and results from the API, we could pass the feedback to the team for investigation.

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    0 comments No comments