Azure Form Recognizer incorrect date conversion

Patrick 1 Reputation point
2021-07-09T14:15:51.58+00:00

I've spotted a problem with date conversion in the response coming from Form Recognizer (Invoice model 2.1.0).
The "InvoiceDate" field is correctly identified and OCR'ed but the resulting "valueDate" property is wrong.

"InvoiceDate": {
    "type": "date",
    "valueDate": "2021-06-01",
    "text": "6th June 2021",
    "confidence": 0.947,
    ...
}

It feels like a bug in the service rather than ML model unless it's using ML to convert text to date value.

Is this the right place to report bugs? The support link in Azure portal has led me here.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,441 questions
{count} votes

2 answers

Sort by: Most helpful
  1. YutongTie-MSFT 46,996 Reputation points
    2021-07-27T07:53:33.113+00:00

    Hello,

    PG has recognized this issue and a fixing will be rolling out. Thanks for pointing out this issue again.

    Regards,
    Yutong

    0 comments No comments

  2. YutongTie-MSFT 46,996 Reputation points
    2021-07-27T23:38:24.02+00:00

    Hello,

    Update for this issue here:

    Root cause of this issue that currently invoice supports only En-US invoices and typical US date formats. Provided invoice is from UK and it causes normalization to fail. Product team is working on invoice language/locale expansion.

    If a lot of invoice has same date format, as a workaround you can try to do custom normalization during post processing as below:

    It depends how many different date/price formats present for your invoices. If it a single format, it should be pretty easy to do in any programming language, i.e. example above can be solved by removal "th" and parsing regular DateTime.Parse, see code below. But if there are a lot of different unsupported format, it will be much more complex.
    var s = "6th June 2021";
    s = s.Replace("nd","").Replace("th","").Replace("rd","").Replace("st","");
    Console.WriteLine(DateTime.Parse(s).ToLongDateString());

    https://dotnetfiddle.net/l6gMvf

    Regards,
    Yutong

    0 comments No comments