FormRecognizer Layout Model, DocumentLine whitespace configuration

Question

Hello!

I had a question about FormRecognizer's Layout Model when using AnalyzeResult. When Extracting DocumentLine objects (AnalyzeResult.DocumentPage.DocumentLine), it seems to denote documentline objects by words found horizontally and separate by whitespaces. Is the width of the whitespaces separating DocumentLine objects something that can be configured?

Thanks,

Answer

@Vincent Villacorta The layout model API does not have a parameter to pass to configure the width of whitespaces. In v2.1 there is an option to pass the reading order but with v3.0 there is no option since natural reading order is used by all features.

In Form Recognizer v2.1, you can specify the order in which the text lines are output with the readingOrder query parameter. Use natural for a more human-friendly reading order output as shown in the following example. This feature is only supported for Latin languages.

In Form Recognizer v3.0, the natural reading order output is used by the service in all cases. Therefore, there is no readingOrder parameter provided in this version.

If you could share the document and results from the API, we could pass the feedback to the team for investigation.

If an answer is helpful, please click on or upvote which might help other community members reading this thread.

Share via

FormRecognizer Layout Model, DocumentLine whitespace configuration

1 answer