Extract dates and numbers from documents

While many fields to be extracted are simple texts, there are cases where the information to extract is a date or a number including amounts.

Importing this data to a target system can be cumbersome, requiring significant custom conversion logic. Most of the import connectors and APIs only accept normalized dates in ISO 8601 format like YYYY-MM-DD. They also accept only numbers using dot (.) as a decimal separator without a thousands separator like NNN.DD.

To learn more about date format, go to ISO 8601 Date and time format.

We’ve added the ability to declare this type during the field creation step of the wizard, and to choose a date or number convention (equivalent to a locale).

Date conventions

The following example shows a mortgage statement with a date field.

Screenshot of the 'Date field' in the 'Overview'.

The following example shows date field formats.

Screenshot of 'Date field' formats.

Supported date formats

When defining the field, choose among Year, Month, Day; Month, Day, Year; or Day, Month, Year.

The following characters can be used as date delimiters: , - / . \. Whitespace can't be used as a delimiter. For example:

  • 01,01,2020
  • 01-01-2020
  • 01/01/2020

The day and month can each be written as one or two digits, and the year can be two or four digits:

  • 1-1-2020
  • 1-01-20

If a date string has eight digits, the delimiter is optional:

  • 01012020
  • 01 01 2020

The month can also be written as its full or short name. If the name is used, delimiter characters are optional. However, this format may be recognized less accurately than others.

  • 01/Jan/2020
  • 01Jan2020
  • 01 Jan 2020

Number conventions

The following example shows a mortgage statement with number fields.

Screenshot of the 'Number field' in the 'Overview'.

The following example shows number field formats.

Screenshot of the 'Number field' formats.

Note

For each field, only one convention is allowed for a given field for all the collections of this model. For instance, if you extract a field amount by selecting Use comma (,) as decimal separator, the following text 1234,56 or 1 234,56 is converted to 1234.56. Amounts with format 12,34,576.78 or 1,234.56 aren't converted.

During the extraction, the text automatically converts according to the convention provided. This converted value can be retrieved using the YOURFIELDNAME value result. This value is empty if the conversion isn't possible. The original text can be retrieved using the YOURFIELDNAME text result.

Supported number formats

When defining the field, choose either Use dot (.) as decimal separator or Use comma (,) as decimal separator.

When the decimal separator is a dot (.), thousand separators can be omitted, and a comma (,) or whitespace can be used. For example:

  • 1234.56
  • 1,234.56
  • 1 234.56

When the decimal separator is a comma (,), thousand separators or whitespace can be omitted. For example:

  • 1234,56
  • 1 234.56

Next step

Train and publish your document processing model

See also

Training: Process custom documents with AI Builder (module)