Best practices and considerations for Document intelligence (preview) design

Overview

Document intelligence is a versatile foundation applicable across various processes, suitable for any workflow involving document review. Whether you require a manual or automated method to oversee, handle, and validate documents within a subprocess, this building block can be integrated.

Document Intelligence Solution Architecture provides information on the architecture of the building block. This article explains the optimal methods and essential factors to contemplate when crafting a solution that incorporates the Document Intelligence building block. By adhering to the recommendations, you can adopt a configuration-first strategy. You can also make most of its pre-existing features, and discover potential customization avenues for business rules, data classification, verification, extraction, and integration to accommodate unique extension needs.

The extension can be in the form of the following new elements:

  • Configurations to introduce new documents or processing steps
  • Model-driven applications with reuse of existing controls, data models, and configurations
  • Power Automate flows to introduce new document types, extra steps, or business rules while processing the document
  • Custom AI Builder models for data extraction, enrichment, or verification
  • Data model elements and authorization rules for new personas
  • Integrations to other internal/external systems for document categorization, data extraction, verification, and to provide supporting information
  • Archival process for cost optimization, retention, and performance

The following diagram illustrates the Document intelligence extension areas for the implementors with the use of built-in controls, configurations, and automation of Document intelligence and the customization capabilities of Power Platform.

A diagram showing the extension areas of document intelligence building block

Download a printable PDF of this diagram.

This article explores the listed extension scenarios and presents you with an approach for their implementation, along with considerations and best practices to consider when designing these scenarios:

  1. Adding New document type to Document Intelligence
  2. Adding Document Intelligence to an existing process in model-driven app
  3. Adding new enrichment step for agents validation
  4. Including additional business rules and data sources for document verification
  5. Mapping extracted information from the document
  6. Using own AI Model or integration with a solution for data extraction
  7. Integration with other master data systems for document management

Scenario-1: Adding new document type (document definition) to Document Intelligence

Scenario-1: Requirement

Ability to add new document type such as income statement document to automate document processing.

Scenario-1: Approach

You can create a new document definition record with an automatic flow by following the outlined steps.

  1. Configure the prerequisites
  2. Add Dataverse definitions (business scenario, document type, document definition, pipeline step definition, and optionally step field definition)
  3. Create a custom pipeline step flow
  4. Create a custom pipeline flow

If the new document doesn't require any automatic flow, you only need to create a new document definition record as described here You can create the new document without defining any custom pipeline flow or custom pipeline step flow.

The Loan onboarding sample app solution ships sample custom pipeline flows and custom pipeline steps and configuration for Document intelligence flow for the Identity document (supported by ID reader). These sample flows can help you understand the extension path for new document types.

Scenario-1: Considerations and best practices

Configure Dataverse definitions

  • In Document intelligence, the definition records (Document definition, Pipeline step definition, Step field definition, and pipeline document state message tables) are solution-aware. We recommend you ship these definitions as part of your custom solution.
  • Each document definition can include only single extraction step and single enrichment outputs, such that the Document user interface displays their output under the Extracted details tab.
  • The fields whose values are displayed in the user interface can be defined as localizable, and can support a multi-language environment (the steps and fields' display names, the step states' descriptions, and the pipeline recommendations).
  • The enrichment step always has to go along with an extraction step. After a step is defined as Enrichment, the step output can be displayed in the Document management UI control as part of Document preview in the Extracted Details tab, under Supporting information.
  • We recommend not to have Required for approval fields marked Read only in an extraction step.

Custom pipeline flow and step flow

  • If the extraction step is defined as the first step for the document definition using the Order column, the main pipeline automatically runs the step. You don't need to add the extraction step in the custom pipeline.
  • Each step can post the Output field in addition to the Raw Output field.
  • We recommend that you run the custom steps within the scope of try-catch blocks to provide final pipeline recommendations in all the cases.
  • Validate if the built-in AI model supports the language, format and size and throttle limits and scope (such as ID reader supports only passports and some valid US identity documents) of your requirement.
  • Plan performance testing scenarios for scalability and validate for any observation of throttle limits and timeouts during data enrichment, verification, and other steps. If you see few occurrences of such failures, you can apply retry policies as shown for the Power Automate actions to pass. Applying the retry policies allows the flows to slow down instead of fail.

A diagram showing how to apply retry policies as shown below to power automate actions to pass

Other considerations and best practices

  • Make sure to design an archiving process for the document that controls the growth of the Annotation table to optimize storage costs and maintain performance.
  • Documents uploaded are stored in the Note table (Annotation) which uses Dataverse File Storage for cost optimization. However, it isn't customizable to host the documents outside of Dataverse for document intelligence processing, nor you can use SharePoint native server-based integration. To control data growth and cost optimization, we recommend archiving documents based on your business requirements. You can use the new approach to define a solution-aware long-term data retention rule (preview) for notes entity or take the legacy approach to define bulk-deletion job to remove old documents.
  • Review supported file extensions and make sure to update the environment variable (Supported file types) accordingly to limit specific types.
  • For security and performance, we recommend limiting the size of the document. The default maximum size allowed as an attachment file is 5120 KB/ 5 MB and you can adjust the value from "Set file size limit for attachment" setting of the system setting.
  • The current solution supports the extraction of fields for Latin characters and table extraction isn't yet supported.

Scenario-2: Adding Document Intelligence to an existing process in model-driven app

Scenario-2: Requirement

Ability to add Document intelligence capabilities like document verification, review, and data extraction information to an existing process implemented using model-driven PowerApps such as loan or account onboarding.

Scenario-2: Approach

The built-in capabilities are extensible by configuration to associate new process entities or existing application entity to document intelligence. The following entity-relationship diagram represents how the current Application table in onboarding essentials capability is associated with Document intelligence via the Document request table using Context field. If Application entity doesn't address the need for the existing process, you can create a relationship from the custom entity to the Document request table with the same Context field as it is polymorphic. You can optionally add a relationship to Related party table to have a link to primary person to bring in supplemental information. You can use the Dataverse API or XRMToolbox community tool to add relationship to existing polymorphic lookup fields.

A diagram showing an entity relationship diagram how to relate custom entities to introduce a new process

After the relationship with the Document request table is created, you can set the Context field with the value of your Custom entity record and set the Regarding with the Related party record.

Scenario-2: Considerations and best practices

  • Assess the usability and extension of existing Onboarding Essentials Application base data model before creating custom process tables as it comes with built-in controls, workflows, and automation.

Scenario-3: Adding new enrichment step for agents validation

Scenario-3: Requirement

Ability to add supporting information for the agent to verify the document. For example, you can add the salary classification to the extracted data display or add information like credit score to the extracted data for further use.

Scenario-3: Approach

To fulfill the requirement, you need to take the following steps. You can refer to Enrichment step article for details.

1. Defining enrichment step in Pipeline step definition table

You need to define an enrichment step per document type in Pipeline step definition table with type = Enrichment.

Following is the sample Enrichment configuration step shipped as part of the sample Document Intelligence for Identification document.

A diagram showing the sample Enrichment configuration step shipped as part of the sample Document Intelligence for Identification document

2. Defining enrichment fields in Step field definition table

You need to define each field to be introduced as supporting information. These fields can be read-only (by setting Read-Only = Yes) or can be made editable in the user interface (by setting Read-Only = No).

In the specified sample configuration, credit score is defined as the supporting information. You can use any information from Dataverse or other systems within the enrichment process flow.

3. Create a custom pipeline step flow

You need to create a new Power Automate flow as detailed here with a manual trigger with specific input fields. The flow should have specific actions to pull data from the source whether from Dataverse or from an existing system of record. You need to add a compose action to place these enrichment values in a json format as specified.

A diagram showing how to add a compose action in custom pipeline step flow to place these enrichment values in a json format

Lastly, this output information should be set in the output field of the below action to call Post pipeline step action core flow.

A diagram showing how to set output field of the below action to call Post pipeline step action core flow

4. Add the action to custom pipeline to call custom pipeline step flow

You need to add an action in the custom pipeline flow to call the custom pipeline step flow. Following is the snapshot from the sample Identification document pipeline flow that shows the actions calling custom pipeline step flows.

A diagram showing how to add an action in the custom pipeline flow to call the custom pipeline step flow

Scenario-3: Consideration and best practices

You can define one enrichment and one extraction step per document definition.

Enrichment value should be sent to Post Pipeline Step action in json format as specified:

export interface IEnrichmentOutput{ 
    fields: { 
        [field_external_id: string]: { 
            value: string; 
            originalValue: string; 
        } 
    } 
}

Scenario-4: Including additional business rules and data sources for document verification

Scenario-4 Requirement

Ability to include additional business rules and data sources for document verification.

Scenario-4 Approach

Document verification can be performed either manually or automatically, depending on your business process for a specific document type. The built-in document flow allows you to add custom pipeline step to establish an automated verification business rule. This rule can make use of the confidence score generated by the data extraction model and can also be extended to incorporate extracted data and information from internal or external verification sources.

To fulfill the requirement, you need to take below steps. You can refer to verification custom flow for details.

1. Define a new pipeline step definition

You need to define a new step per document type in Pipeline step definition table with type = Other. You can define your threshold for success and failure to decide whether to automatically approve the document, mark as unclear or fail the automation to leave the document for manual review.

Here's an example configuration for the verification step included in the sample Document Intelligence for Identification document.

A diagram showing a sample definition of a new pipeline.

2. Create a custom pipeline step flow

You need to create a new Power Automate flow as detailed here with a manual trigger with specific input fields including data extraction output with confidence score. You can solely use the confidence score to determine the document status for verification and also use any other data sources or document verification systems to implement your business rules. The calculated document state information should be set in the output field of the below action to call Post pipeline step action core flow.

3. Add the action to custom pipeline to call custom pipeline step flow

You need to add an action in the custom pipeline flow to call the custom pipeline step flow created in the previous step. Following is the snapshot from the sample Identification document pipeline flow that shows the actions calling custom pipeline step flows. You can take the output of the verification step and reflect it to pipeline document state that is displayed on the user interface.

A diagram showing a sample on how to add an action in the custom pipeline flow to call the custom pipeline step flow.

Scenario-4: Consideration and best practices

Each step must use the Pre-pipeline step and Post-pipeline step core flows to indicate the status using the Step definition ID and the Pipeline ID from the parent flow. The custom pipelines flow that runs these steps gathers the results of all the steps and determines the complete pipeline status using Post-pipeline core flow. All these step and pipeline results are displayed in the user interface.

A diagram showing how the custom pipeline results are displayed

Scenario-5: Mapping extracted information from the document

Scenario-5: Requirement

Ability to map the extracted information from the uploaded document to Dataverse tables or internal/external data sources.

Scenario-5: Approach

You can fulfill the requirement by extending your custom pipeline flow with additional actions to transform the data extraction output. Data extraction output can be accessed from built-in Get pipeline details helper core flow or you can retrieve the same information Document Pipeline Step table Raw output field. Data extraction output Json Schema is as follows:

 export interface IExtractionOutput { 
    pageCount: number; 
    collection: string; 
    collectionConfidence: number; 
    fields: { 
        [field_external_id: string]: { 
            value: string; 
            originalValue: string; 
            confidence: number; 
        }; 
    }; 
}

You can use a compose action as specified to parse this output information and use this information to feed Dataverse tables or internal/external data sources.

A diagram showing how to parse the extraction output information and use this information to feed Dataverse tables or internal/external data sources

Scenario-6: Using own AI Model or integration with a solution for data extraction

Scenario-6: Requirement

Ability to bring your own AI model for data extraction or integrate with internal / external system for data extraction from the document.

Scenario-6: Approach

Start by taking a configure-first or low-code approach to assess prebuilt AI Models or create a custom document processing model in AI Builder, and validate if these built-in capabilities fulfill your data extraction requirements.

If you would like to bring your own AI model to AI Builder, you can take steps described in AI Builder article and details in the tutorial. The alternate path can be to integrate the document intelligence custom pipeline flow with your data extraction system directly.

To fulfill the requirement, you need to take below steps.

1. Define extraction step in Pipeline step definition table

You need to define an extraction step per document type in Pipeline step definition table with type = Extraction.

Following is the sample Extraction configuration for the Identification document type. If you're using a built-in model, you can set AI builder model, otherwise you can leave this blank.

A diagram showing the sample Extraction configuration for the Identification document type

2. Define extraction fields in Step field definition table

You need to define each field in the data extraction output.

3. Create a custom pipeline step flow

For the use of built-in AI Models, you don’t need to define custom pipeline step flow nor add additional actions in custom pipeline flow. But for custom models, you need to create a new power automate as detailed here with a manual trigger with specific input fields. The flow should have specific actions to pull data from the data extraction model or systems. Then, you need to add a compose action to place these extraction values in json format as specified:

export interface IExtractionOutput { 
    pageCount: number; 
    collection: string; 
    collectionConfidence: number; 
    fields: { 
        [field_external_id: string]: { 
            value: string; 
            originalValue: string; 
            confidence: number; 
        }; 
    }; 
}

Lastly, this output information should be set in the output field of the below action to call Post pipeline step action core flow.

A diagram showing how to set the extraction output to post pipeline

4. Add the action to custom pipeline to call custom pipeline step flow

You need to add an action in the custom pipeline flow to call the custom pipeline step flow created for data extraction. The output of data extraction step flow is used to populate the extracted details, as specified. The confidence score of the model and each field can be used for document verification and setting the document status accordingly.

A diagram showing how the extraction output is displayed in the user interface

Scenario-6: Consideration and best practices

  • You need to consider whether the data extraction model can support the document language, format, and size.
  • You need to assess the reliability, performance, and accuracy of your own data extraction models before use. Make sure throttle limits or throughput of the given model satisfy the needs of the business.
  • For the scenario of bringing your own AI model into AI Builder, review the limitations.

Scenario-7: Integration with other master data systems for document management

Scenario-7: Requirement

Ability to integrate with the master document management system for documents.

Scenario-7: Approach

Currently, Document intelligence uploads and reads documents from the Notes (Annotation) table in Dataverse are associated with the document request. You need to get the documents into Dataverse for processing. You can integrate two custom APIs (UploadDocument and DeleteDocument) to submit and delete documents as illustrated.

A diagram showing the two custom APIs (UploadDocument and DeleteDocument) that can be used for integration to submit and delete documents

Download a printable PDF of this diagram.

Scenario-7: Consideration and best practices

  • For any batch or async integrations, consider using the guidance shared in the integration design article.
  • Record the generated IDs (Annotation, Document, and Document Request) in your systems for traceability and integration with document intelligence.

See also

Next steps