Keep newlines in detected texts in Azure Form Recognizer

Cyril Carraz 31 Reputation points
2023-02-20T08:51:03.1666667+00:00

Hello,

I'm training a custom neural model in Azure Form Recognizer v3.0 and there are certain fields in the annex that are long texts (made of many paragraphs, bullet lists, etc...), these texts may span accross multiple pages.

The model can detect these texts easily, but it seems to sometimes (and only sometimes) remove the newline characters (\n) in the value/content of the field. How do I avoid that happening ?

I tried relabeling these texts in a table with a single column to make each line in a single row and join them with "\n" in the postprocessing step later, but that made the detection much worse, sometimes it even skips certain lines or paragraphs (or worse, it skips certain dates for some reason).

How do I parse the text and keep it exactly as is line by line ?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,478 questions
{count} votes