question

EmilyHarper-9803 avatar image
1 Vote"
EmilyHarper-9803 asked YutongTie-MSFT answered

Is the pre-built Invoices model for form recognizer set up only for US format invoices?

Can you please confirm if the pre-built invoices model (https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-invoice) for form recognizer (part of the azure cognitive services) is specific to US format invoices?

I see it says it had US locale but we were trying it on non US invoices and have seen some faults. A European invoice came in and the total text was "557,38", which in some countries is equal to saying "557.38". They use comma as the decimal places instead of a full stop. However, when this came in, it assumed this was punctuation and ended up giving us a total of "57738". Does this mean we cannot feed in any invoices that do not use a full stop as the decimal places?

Is this going to be the same issue also with dates? Will it assume they are all in MM/DD/YYYY when in the rest of the world it is largely more common for our invoices to be DD/MM/YYYY.

Thank you

146496-total.png


azure-cognitive-servicesazure-form-recognizer
total.png (118.0 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

YutongTie-MSFT avatar image
1 Vote"
YutongTie-MSFT answered

Hello @EmilyHarper-9803

You are correct, root cause of this issue that currently invoice supports only En-US invoices and typical US date formats. Provided invoice is from UK and it causes normalization to fail. We are working on invoice language/locale expansion but there are not exact ETA for en-Gb support.

If a lot of invoice has same date format, as a workaround you can try to do custom normalization during post processing.

It depends how many different date/price formats present in customer invoices. If it a single format, it should be pretty easy to do in any programming language, i.e. example above can be solved by removal "th" and parsing regular DateTime.Parse, see code below. But if there are a lot of different unsupported format, it will be much more complex.

     var s = "6th June 2021";
     s = s.Replace("nd","").Replace("th","").Replace("rd","").Replace("st","");
     Console.WriteLine(DateTime.Parse(s).ToLongDateString());

https://dotnetfiddle.net/l6gMvf

Hope this will help. Please let us know if any further queries.



  • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification

  • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators







5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.