Is the pre-built Invoices model for form recognizer set up only for US format invoices?

Emily Harper 111 Reputation points
2021-11-04T11:27:48.053+00:00

Can you please confirm if the pre-built invoices model (https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-invoice) for form recognizer (part of the azure cognitive services) is specific to US format invoices?

I see it says it had US locale but we were trying it on non US invoices and have seen some faults. A European invoice came in and the total text was "557,38", which in some countries is equal to saying "557.38". They use comma as the decimal places instead of a full stop. However, when this came in, it assumed this was punctuation and ended up giving us a total of "57738". Does this mean we cannot feed in any invoices that do not use a full stop as the decimal places?

Is this going to be the same issue also with dates? Will it assume they are all in MM/DD/YYYY when in the rest of the world it is largely more common for our invoices to be DD/MM/YYYY.

Thank you

146496-total.png

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,405 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,415 questions
0 comments No comments
{count} votes

Accepted answer
  1. YutongTie-MSFT 46,991 Reputation points
    2021-11-05T02:25:20.267+00:00

    Hello @Emily Harper

    You are correct, root cause of this issue that currently invoice supports only En-US invoices and typical US date formats. Provided invoice is from UK and it causes normalization to fail. We are working on invoice language/locale expansion but there are not exact ETA for en-Gb support.

    If a lot of invoice has same date format, as a workaround you can try to do custom normalization during post processing.

    It depends how many different date/price formats present in customer invoices. If it a single format, it should be pretty easy to do in any programming language, i.e. example above can be solved by removal "th" and parsing regular DateTime.Parse, see code below. But if there are a lot of different unsupported format, it will be much more complex.

        var s = "6th June 2021";  
        s = s.Replace("nd","").Replace("th","").Replace("rd","").Replace("st","");  
        Console.WriteLine(DateTime.Parse(s).ToLongDateString());  
    

    https://dotnetfiddle.net/l6gMvf

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful