Data Types in Azure OCR Form Recognizer

Aurand, Josh 1 Reputation point
2021-07-13T19:53:56.99+00:00

Hi, question on the data types (string, number, date, time, integer) and subtypes (i.e. for string, no-whitespaces, alphanumeric, not-specified) in the Azure OCR form recognizer.

Do they affect what value the recognizer actually reads/returns in the JSON? i.e.

  • the text value read is "email name@ example website. com", it will return that if I say the tag is String not-specified but if I say String no-whitespaces, it might return "emailname@examplewebsite.com"
  • the text value read is "13148", if I say the tag is String it will return 13148 but if I say it's Date, dmy, it might return "1/31/48"

or do they only let you read the JSON and say, XX field is a date, giving you an additional attribute to work with when using the JSON for other things?

mainly asking bc it would be great if the recognizer could remove whitespaces from email addresses - but in the tag value preview it does not seem to be doing that.

Thanks!

Azure Document Intelligence
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 14,421 Reputation points
    2021-07-14T10:48:40.777+00:00

    @Aurand, Josh Thanks for the question. Can you please share snapshot for the same. Optionally, You can set the expected data type for each tag. Open the context menu to the right of a tag and select a type from the menu. This feature allows the detection algorithm to make certain assumptions that will improve the text-detection accuracy. It also ensures that the detected values will be returned in a standardized format in the final JSON output.
    114585-image.png