Azure AI Document Intelligence is an Azure AI service that lets you build automated data processing software using machine-learning technology. Document Intelligence enables you to identify and extract text, key/value pairs, selection marks, table data, and more from your documents. The results are delivered as structured data that includes the relationships in the original file.
In this article you learn how to download, install, and run Document Intelligence containers. Containers enable you to run the Document Intelligence service in your own environment. Containers are great for specific security and data governance requirements.
Read, Layout, General Document, ID Document, Receipt, Invoice, Business Card, and Custom models are supported by Document Intelligence v3.0 containers.
Business Card model is currently only supported in the v2.1 containers.
Important
Document Intelligence v3.0 containers are now generally available. If you are getting started with containers, consider using the v3 containers.
In this article you learn how to download, install, and run Document Intelligence containers. Containers enable you to run the Document Intelligence service in your own environment. Containers are great for specific security and data governance requirements.
Layout, Business Card,ID Document, Receipt, Invoice, and Custom models are supported by six Document Intelligence feature containers.
For Receipt, Business Card and ID Document containers you also need the Read OCR container.
You also need the following to use Document Intelligence containers:
Required
Purpose
Familiarity with Docker
You should have a basic understanding of Docker concepts, like registries, repositories, containers, and container images, as well as knowledge of basic dockerterminology and commands.
Docker Engine installed
You need the Docker Engine installed on a host computer. Docker provides packages that configure the Docker environment on macOS, Windows, and Linux. For a primer on Docker and container basics, see the Docker overview.
Docker must be configured to allow the containers to connect with and send billing data to Azure.
On Windows, Docker must also be configured to support Linux containers.
Document Intelligence resource
A single-service Azure AI Document Intelligence or multi-service resource in the Azure portal. To use the containers, you must have the associated key and endpoint URI. Both values are available on the Azure portal Document Intelligence Keys and Endpoint page:
{FORM_RECOGNIZER_KEY}: one of the two available resource keys.
{FORM_RECOGNIZER_ENDPOINT_URI}: the endpoint for the resource used to track billing information.
Optional
Purpose
Azure CLI (command-line interface)
The Azure CLI enables you to use a set of online commands to create and manage Azure resources. It's available to install in Windows, macOS, and Linux environments and can be run in a Docker container and Azure Cloud Shell.
You also need an Azure AI Vision API resource to process business cards, ID documents, or Receipts.
You can access the Recognize Text feature as either an Azure resource (the REST API or SDK) or a cognitive-services-recognize-textcontainer.
If you use the cognitive-services-recognize-text container, make sure that your Azure AI Vision key for the Document Intelligence container is the key specified in the Azure AI Vision docker run or docker compose command for the cognitive-services-recognize-text container and your billing endpoint is the container's endpoint (for example, http://localhost:5000).
If you use both the Azure AI Vision container and Document Intelligence container together on the same host, they can't both be started with the default port of 5000.
Pass in both the key and endpoints for your Azure AI Vision Azure cloud or Azure AI container:
{COMPUTER_VISION_KEY}: one of the two available resource keys.
{COMPUTER_VISION_ENDPOINT_URI}: the endpoint for the resource used to track billing information.
Host computer requirements
The host is a x64-based computer that runs the Docker container. It can be a computer on your premises or a Docker hosting service in Azure, such as:
The following table lists the supporting container(s) for each Document Intelligence container you download. For more information, see the Billing section.
Feature container
Supporting container(s)
Layout
Not required
Business Card
Azure AI Vision Read
ID Document
Azure AI Vision Read
Invoice
Layout
Receipt
Azure AI Vision Read
Custom
Custom API, Custom Supervised, Layout
Feature container
Supporting container(s)
Read
Not required
Layout
Not required
Business Card
Read
General Document
Layout
Invoice
Layout
Receipt
Read or Layout
ID Document
Read
Custom Template
Layout
Recommended CPU cores and memory
Note
The minimum and recommended values are based on Docker limits and not the host machine resources.
Document Intelligence containers
Container
Minimum
Recommended
Read
8 cores, 10-GB memory
8 cores, 24-GB memory
Layout
8 cores, 16-GB memory
8 cores, 24-GB memory
Business Card
8 cores, 16-GB memory
8 cores, 24-GB memory
General Document
8 cores, 12-GB memory
8 cores, 24-GB memory
ID Document
8 cores, 8-GB memory
8 cores, 24-GB memory
Invoice
8 cores, 16-GB memory
8 cores, 24-GB memory
Receipt
8 cores, 11-GB memory
8 cores, 24-GB memory
Custom Template
8 cores, 16-GB memory
8 cores, 24-GB memory
Read, Layout, and prebuilt containers
Container
Minimum
Recommended
Read 3.2
8 cores, 16-GB memory
8 cores, 24-GB memory
Layout 2.1
8 cores, 16-GB memory
8 cores, 24-GB memory
Business Card 2.1
2 cores, 4-GB memory
4 cores, 4-GB memory
ID Document 2.1
1 core, 2-GB memory
2 cores, 2-GB memory
Invoice 2.1
4 cores, 8-GB memory
8 cores, 8-GB memory
Receipt 2.1
4 cores, 8-GB memory
8 cores, 8-GB memory
Custom containers
The following host machine requirements are applicable to train and analyze requests:
Container
Minimum
Recommended
Custom API
0.5 cores, 0.5-GB memory
1 core, 1-GB memory
Custom Supervised
4 cores, 2-GB memory
8 cores, 4-GB memory
Each core must be at least 2.6 gigahertz (GHz) or faster.
Core and memory correspond to the --cpus and --memory settings, which are used as part of the docker compose or docker run command.
Tip
You can use the docker images command to list your downloaded container images. For example, the following command lists the ID, repository, and tag of each downloaded container image, formatted as a table:
docker images --format "table {{.ID}}\t{{.Repository}}\t{{.Tag}}"
IMAGE ID REPOSITORY TAG
<image-id> <repository-path/name> <tag-name>
Run the container with the docker-compose up command
Replace the {ENDPOINT_URI} and {API_KEY} values with your resource Endpoint URI and the key from the Azure resource page.
Ensure that the EULA value is set to accept.
The EULA, Billing, and ApiKey values must be specified; otherwise the container can't start.
Important
The keys are used to access your Document Intelligence resource. Do not share your keys. Store them securely, for example, using Azure Key Vault. We also recommend regenerating these keys regularly. Only one key is necessary to make an API call. When regenerating the first key, you can use the second key for continued access to the service.
The following code sample is a self-contained docker compose example to run the Document Intelligence Layout container. With docker compose, you use a YAML file to configure your application's services. Then, with the docker-compose up command, you create and start all the services from your configuration. Enter {FORM_RECOGNIZER_ENDPOINT_URI} and {FORM_RECOGNIZER_KEY} values for your Layout container instance.
Now, you can start the service with the docker compose command:
docker-compose up
The following code sample is a self-contained docker compose example to run the Document Intelligence General Document container. With docker compose, you use a YAML file to configure your application's services. Then, with docker-compose up command, you create and start all the services from your configuration. Enter {FORM_RECOGNIZER_ENDPOINT_URI} and {FORM_RECOGNIZER_KEY} values for your General Document and Layout container instances.
Now, you can start the service with the docker compose command:
docker-compose up
Given the resources on the machine, the General Document container might take some time to start up.
The following code sample is a self-contained docker compose example to run the Document Intelligence Layout container. With docker compose, you use a YAML file to configure your application's services. Then, with docker-compose up command, you create and start all the services from your configuration. Enter {FORM_RECOGNIZER_ENDPOINT_URI} and {FORM_RECOGNIZER_KEY} values for your Layout container instance.
Now, you can start the service with the docker compose command:
docker-compose up
The following code sample is a self-contained docker compose example to run the Document Intelligence Invoice container. With docker compose, you use a YAML file to configure your application's services. Then, with docker-compose up command, you create and start all the services from your configuration. Enter {FORM_RECOGNIZER_ENDPOINT_URI} and {FORM_RECOGNIZER_KEY} values for your Invoice and Layout container instances.
Now, you can start the service with the docker compose command:
docker-compose up
The following code sample is a self-contained docker compose example to run the Document Intelligence General Document container. With docker compose, you use a YAML file to configure your application's services. Then, with docker-compose up command, you create and start all the services from your configuration. Enter {FORM_RECOGNIZER_ENDPOINT_URI} and {FORM_RECOGNIZER_KEY} values for your Receipt and Read container instances.
Now, you can start the service with the docker compose command:
docker-compose up
The following code sample is a self-contained docker compose example to run the Document Intelligence General Document container. With docker compose, you use a YAML file to configure your application's services. Then, with docker-compose up command, you create and start all the services from your configuration. Enter {FORM_RECOGNIZER_ENDPOINT_URI} and {FORM_RECOGNIZER_KEY} values for your ID and Read container instances.
We reference the file path for this folder as {FILE_MOUNT_PATH}.
Copy the file path in a convenient location, you need to add it to your .env file. As an example if the folder is called files, located in the same folder as the docker-compose file, the .env file entry is FILE_MOUNT_PATH="./files"
Create a folder to store the logs written by the Document Intelligence service on your local machine
Name this folder output.
We reference the file path for this folder as {OUTPUT_MOUNT_PATH}.
Copy the file path in a convenient location, you need to add it to your .env file. As an example if the folder is called output, located in the same folder as the docker-compose file, the .env file entry is OUTPUT_MOUNT_PATH="./output"
Create a folder for storing internal processing shared between the containers
Name this folder shared.
We reference the file path for this folder as {SHARED_MOUNT_PATH}.
Copy the file path in a convenient location, you need to add it to your .env file. As an example if the folder is called shared, located in the same folder as the docker-compose file, the .env file entry is SHARED_MOUNT_PATH="./shared"
Create a folder for the Studio to store project related information
Name this folder db.
We reference the file path for this folder as {DB_MOUNT_PATH}.
Copy the file path in a convenient location, you need to add it to your .env file. As an example if the folder is called db, located in the same folder as the docker-compose file, the .env file entry is DB_MOUNT_PATH="./db"
The following code sample is a self-contained docker compose example to run Document Intelligence Layout, Studio and Custom template containers together. With docker compose, you use a YAML file to configure your application's services. Then, with docker-compose up command, you create and start all the services from your configuration.
The custom template container can use Azure Storage queues or in memory queues. The Storage:ObjectStore:AzureBlob:ConnectionString and queue:azure:connectionstring environment variables only need to be set if you're using Azure Storage queues. When running locally, delete these variables.
Ensure the service is running
To ensure that the service is up and running. Run these commands in an Ubuntu shell.
$cd <folder containing the docker-compose file>
$source .env
$docker-compose up
Custom template containers require a few different configurations and support other optional configurations
Setting
Required
Description
EULA
Yes
License acceptance Example: Eula=accept
Billing
Yes
Billing endpoint URI of the FR resource
ApiKey
Yes
The endpoint key of the FR resource
Queue:Azure:ConnectionString
No
Azure Queue connection string
Storage:ObjectStore:AzureBlob:ConnectionString
No
Azure Blob connection string
HealthCheck:MemoryUpperboundInMB
No
Memory threshold for reporting unhealthy to liveness. Default: Same as recommended memory
StorageTimeToLiveInMinutes
No
TTL duration to remove all intermediate and final files. Default: Two days, TTL can set between five minutes to seven days
Task:MaxRunningTimeSpanInMinutes
No
Maximum running time for treating request as timeout. Default: 60 minutes
HTTP_PROXY_BYPASS_URLS
No
Specify URLs for bypassing proxy Example: HTTP_PROXY_BYPASS_URLS = abc.com, xyz.com
Specify Layout container uri Example:AzureCognitiveServiceLayoutHost=http://onprem-frlayout:5000
Use the Document Intelligence Studio to train a model
Gather a set of at least five forms of the same type. You use this data to train the model and test a form. You can use a sample data set (download and extract sample_data.zip).
Once you can confirm that the containers are running, open a browser and navigate to the endpoint where you have the containers deployed. If this deployment is your local machine, the endpoint is [http://localhost:5000](http://localhost:5000).
Select the custom extraction model tile.
Select the Create project option.
Provide a project name and optionally a description
On the "configure your resource" step, provide the endpoint to your custom template model. If you deployed the containers on your local machine, use this URL [http://localhost:5000](http://localhost:5000).
Provide a subfolder for where your training data is located within the files folder.
Finally, create the project
You should now have a project created, ready for labeling. Upload your training data and get started labeling. If you're new to labeling, see build and train a custom model
Using the API to train
If you plan to call the APIs directly to train a model, the custom template model train API requires a base64 encoded zip file that is the contents of your labeling project. You can omit the PDF or image files and submit only the JSON files.
Once you have your dataset labeled and *.ocr.json, *.labels.json and fields.json files added to a zip, use the PowerShell commands to generate the base64 encoded string.
Document Intelligence v2.1 doesn't support the Read container.
Document Intelligence v2.1 doesn't support the General Document container.
The following code sample is a self-contained docker compose example to run the Document Intelligence Layout container. With docker compose, you use a YAML file to configure your application's services. Then, with docker-compose up command, you create and start all the services from your configuration. Enter {FORM_RECOGNIZER_ENDPOINT_URI} and {FORM_RECOGNIZER_KEY} values for your Layout container instance.
Now, you can start the service with the docker compose command:
docker-compose up
The following code sample is a self-contained docker compose example to run Document Intelligence Invoice and Layout containers together. With docker compose, you use a YAML file to configure your application's services. Then, with docker-compose up command, you create and start all the services from your configuration. Enter {FORM_RECOGNIZER_ENDPOINT_URI} and {FORM_RECOGNIZER_KEY} values for your Invoice and Layout containers.
Now, you can start the service with the docker compose command:
docker-compose up
The following code sample is a self-contained docker compose example to run Document Intelligence Receipt and Read containers together. With docker compose, you use a YAML file to configure your application's services. Then, with docker-compose up command, you create and start all the services from your configuration. Enter {FORM_RECOGNIZER_ENDPOINT_URI} and {FORM_RECOGNIZER_KEY} values for your Receipt container. Enter {COMPUTER_VISION_ENDPOINT_URI} and {COMPUTER_VISION_KEY} values for your Azure AI Vision Read container.
Now, you can start the service with the docker compose command:
docker-compose up
The following code sample is a self-contained docker compose example to run Document Intelligence ID Document and Read containers together. With docker compose, you use a YAML file to configure your application's services. Then, with docker-compose up command, you create and start all the services from your configuration. Enter {FORM_RECOGNIZER_ENDPOINT_URI} and {FORM_RECOGNIZER_KEY} values for your ID document container. Enter {COMPUTER_VISION_ENDPOINT_URI} and {COMPUTER_VISION_KEY} values for your Azure AI Vision Read container.
Now, you can start the service with the docker compose command:
docker-compose up
The following code sample is a self-contained docker compose example to run Document Intelligence Business Card and Read containers together. With docker compose, you use a YAML file to configure your application's services. Then, with docker-compose up command, you create and start all the services from your configuration. Enter {FORM_RECOGNIZER_ENDPOINT_URI} and {FORM_RECOGNIZER_KEY} values for your Business Card container instance. Enter {COMPUTER_VISION_ENDPOINT_URI} and {COMPUTER_VISION_KEY} for your Azure AI Vision Read container.
Gather a set of at least six forms of the same type. You use this data to train the model and test a form. You can use a sample data set (download and extract sample_data.zip). Download the training files to the shared folder you created.
If you want to label your data, download the Document Intelligence Sample Labeling tool for Windows. The download imports the labeling tool .exe file that you use to label the data present on your local file system. You can ignore any warnings that occur during the download process.
Create a new Sample Labeling tool project
Open the labeling tool by double-clicking on the Sample Labeling tool .exe file.
On the left pane of the tool, select the connections tab.
Select to create a new project and give it a name and description.
For the provider, choose the local file system option. For the local folder, make sure you enter the path to the folder where you stored the sample data files.
Navigate back to the home tab and select the Use custom to train a model with labels and key-value pairs option.
Select the train button on the left pane to train the labeled model.
Save this connection and use it to label your requests.
You can choose to analyze the file of your choice against the trained model.
• Create a docker compose file
Name this file docker-compose.yml
The following code sample is a self-contained docker compose example to run Document Intelligence Layout, Label Tool, Custom API, and Custom Supervised containers together. With docker compose, you use a YAML file to configure your application's services. Then, with docker-compose up command, you create and start all the services from your configuration.
To ensure that the service is up and running. Run these commands in an Ubuntu shell.
$cd <folder containing the docker-compose file>
$source .env
$docker-compose up
Create a new connection
On the left pane of the tool, select the connections tab.
Select create a new project and give it a name and description.
For the provider, choose the local file system option. For the local folder, make sure you enter the path to the folder where you stored the sample data files.
Navigate back to the home tab and select Use custom to train a model with labels and key-value pairs.
Select the train button on the left pane to train the labeled model.
Save this connection and use it to label your requests.
You can choose to analyze the file of your choice against the trained model.
The Sample Labeling tool and Azure Container Instances (ACI)
There are several ways to validate that the container is running:
The container provides a homepage at \ as a visual validation that the container is running.
You can open your favorite web browser and navigate to the external IP address and exposed port of the container in question. Use the listed request URLs to validate the container is running. The listed example request URLs are http://localhost:5000, but your specific container may vary. Keep in mind that you're navigating to your container's External IP address and exposed port.
Request URL
Purpose
http://localhost:5000/
The container provides a home page.
http://localhost:5000/ready
Requested with GET, this request provides a verification that the container is ready to accept a query against the model. This request can be used for Kubernetes liveness and readiness probes.
http://localhost:5000/status
Requested with GET, this request verifies if the api-key used to start the container is valid without causing an endpoint query. This request can be used for Kubernetes liveness and readiness probes.
http://localhost:5000/swagger
The container provides a full set of documentation for the endpoints and a Try it out feature. With this feature, you can enter your settings into a web-based HTML form and make the query without having to write any code. After the query returns, an example CURL command is provided to demonstrate the HTTP headers and body format that's required.
Stop the containers
To stop the containers, use the following command:
docker-compose down
Billing
The Document Intelligence containers send billing information to Azure by using a Document Intelligence resource on your Azure account.
Queries to the container are billed at the pricing tier of the Azure resource that's used for the Key. You're billed for each container instance used to process your documents and images.
Note
Currently, Document Intelligence v3 containers only support pay as you go pricing. Support for commitment tiers and disconnected mode will be added in March 2023.
Azure AI containers aren't licensed to run without being connected to the metering / billing endpoint. Containers must be enabled to always communicate billing information with the billing endpoint. Azure AI containers don't send customer data, such as the image or text that's being analyzed, to Microsoft.
Queries to the container are billed at the pricing tier of the Azure resource that's used for the Key. You're billed for each container instance used to process your documents and images. Thus, If you use the business card feature, you're billed for the Document Intelligence BusinessCard and Azure AI Vision Read container instances. For the invoice feature, you're billed for the Document Intelligence Invoice and Layout container instances. See, Document Intelligence and Azure AI Vision Read feature container pricing.
Azure AI containers aren't licensed to run without being connected to the metering / billing endpoint. Containers must be enabled to always communicate billing information with the billing endpoint. Azure AI containers don't send customer data, such as the image or text that's being analyzed, to Microsoft.
Connect to Azure
The container needs the billing argument values to run. These values allow the container to connect to the billing endpoint. The container reports usage about every 10 to 15 minutes. If the container doesn't connect to Azure within the allowed time window, the container continues to run, but doesn't serve queries until the billing endpoint is restored. The connection is attempted 10 times at the same time interval of 10 to 15 minutes. If it can't connect to the billing endpoint within the 10 tries, the container stops serving requests. See the Azure AI container FAQ for an example of the information sent to Microsoft for billing.
Billing arguments
The docker-compose up command starts the container when all three of the following options are provided with valid values:
Option
Description
ApiKey
The key of the Azure AI services resource that's used to track billing information. The value of this option must be set to a key for the provisioned resource that's specified in Billing.
Billing
The endpoint of the Azure AI services resource that's used to track billing information. The value of this option must be set to the endpoint URI of a provisioned Azure resource.
Eula
Indicates that you accepted the license for the container. The value of this option must be set to accept.
That's it! In this article, you learned concepts and workflows for downloading, installing, and running Document Intelligence containers. In summary:
Document Intelligence provides seven Linux containers for Docker.
Container images are downloaded from mcr.
Container images run in Docker.
The billing information must be specified when you instantiate a container.
Important
Azure AI containers are not licensed to run without being connected to Azure for metering. Customers need to enable the containers to communicate billing information with the metering service at all times. Azure AI containers do not send customer data (for example, the image or text that is being analyzed) to Microsoft.