Training and Using Custom Models in Azure Form Recognizer Containers

Question

Training and Using Custom Models in Azure Form Recognizer Containers

Valentin Brosch 25

Hi,

we are currently evaluating whether it would be possible to use Azure Form Recognizer for building an OCR pipeline on sensitive data. We have already applied for the connected container, however, I was not able to find any information whether it is possible to train custom models in the Docker containers.

I have two questions:

Is it possible to train custom models using the Docker containers without sending any document to the cloud?
If it is not possible to train them: Can we at-least use our cloud-trained models in the connected containers? This way, we can train our model on synthetic data/forms in the cloud (without PII data), "download" the model to the containers and run the inference on the real and sensitive data locally.

Thanks

Valentin

Valentin Brosch 25 Reputation points

2023-05-30T17:10:23.5733333+00:00

@VasimTamboli Thanks for your answer. That should be possible for us. Could you please provide some instructions on how we can import the trained models into the containers?

Once we setup the container and link it to our azure subscription, can we just pass in the name/identifier of the model and is it downloaded automatically? Or is this a manual step? If the latter, where should we put the model and which image tag do we need to use?

Sorry for all the questions, but I did not find any mentioning of this use case in the official docs...
YutongTie-MSFT 53,971 Reputation points Moderator

2023-05-30T23:43:50.63+00:00

Hello @Valentin Brosch Are you concerned about the data privacy? In the document it mentions -** Cognitive Services containers do not send customer data (for example, the image or text that is being analyzed) to Microsoft.**

Please see the important notice under the summary in the document - https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/containers/form-recognizer-container-install-run?view=form-recog-3.0.0&tabs=read#summary

I hope this helps.

Regards, Yutong -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.
Valentin Brosch 25 Reputation points

2023-05-31T06:59:07.6733333+00:00

Hi @YutongTie-MSFT , thanks for your answer. Yes, we are very cautious as the data is highly protected by law and we must also comply with the GDPR. I know that the containers do not send any information (apart from billing) to the cloud (on inference), we are interested in how we can train custom models and run them inside the container.

Regards

Valentin
VasimTamboli 5,215 Reputation points

2023-05-31T07:12:59.16+00:00
@Valentin Brosch

When using Azure Form Recognizer containers, you can import the trained models into the containers for local inference. Here's an overview of the process:

Train the model in the cloud: Using the cloud-based Form Recognizer service, you can train and generate custom models using synthetic data or forms without sensitive information. This process requires access to the cloud service, and you can follow the documentation on how to train a custom model using Azure Form Recognizer.

Download the trained model: Once the model is trained in the cloud, you can download it as a model file. The model file will be in the form of a pre-built Docker image (.docker) or a TensorFlow SavedModel (.zip). Download the appropriate model file based on your requirements.

Setup and link the container: Set up the Azure Form Recognizer container in your local environment and link it to your Azure subscription. Ensure that you have the necessary access and permissions to use the container with your subscription.

Import the trained model: In the container setup, there will be a specific location or directory where you can put the downloaded model file. This location can vary based on the container image you're using. Consult the documentation or instructions provided with the container image to determine the correct location.

Specify the model identifier: Once the model file is in the correct location within the container, you need to specify the model identifier when using the container for local inference. This identifier could be the name of the model or an identifier associated with the model. The exact method for specifying the model identifier can depend on the container image and the specific API or SDK you're using.

Local inference: With the trained model imported and the model identifier specified, you can now use the container for local inference on real and sensitive data. The container will utilize the downloaded model to perform OCR tasks without sending the documents to the cloud.

Here may be detailed steps

step-by-step guide on how to import trained models into Azure Form Recognizer containers for local inference:

Step 1: Train the model in the cloud using Azure Form Recognizer service

Follow the official Azure Form Recognizer documentation on how to train a custom model using the cloud-based service.

Use synthetic data or forms without sensitive information to train the model.

Ensure that you have the necessary access and permissions to train and generate custom models in the cloud.

Step 2: Download the trained model from Azure Form Recognizer

Once the model is trained in the cloud, download the model file.

The model file will be in the form of a pre-built Docker image (.docker) or a TensorFlow SavedModel (.zip), depending on your selection during training.

Step 3: Set up the Azure Form Recognizer container

Set up the Azure Form Recognizer container in your local environment or on-premises infrastructure.

Ensure that you have the necessary dependencies and resources to run the container.

Step 4: Link the container to your Azure subscription

Connect the Azure Form Recognizer container to your Azure subscription.

Follow the documentation provided by Azure to establish the link between your container and Azure.

Step 5: Import the trained model into the container

Determine the appropriate location or directory within the container where the model file needs to be placed.

Consult the documentation or instructions provided with the container image to find the correct location.

Copy the downloaded model file (Docker image or TensorFlow SavedModel) into the designated location within the container.

Step 6: Specify the model identifier in the container

Depending on the container image and the specific API or SDK you're using, you need to specify the model identifier.

The model identifier could be the name of the model or an identifier associated with the model.

Refer to the documentation or instructions provided with the container image to understand how to specify the model identifier correctly.

Step 7: Perform local inference using the container

With the trained model imported into the container and the model identifier specified, you can now use the container for local inference.

Utilize the container's OCR capabilities to process real and sensitive data locally, without sending the documents to the cloud.

Please note that the exact steps and instructions may vary depending on the container image and the specific implementation you are using. It's essential to refer to the documentation and instructions provided with the container image to get accurate and detailed guidance on each step.

If you have any further questions or need assistance with specific details, feel free to ask!

Accepted answer

1 additional answer

Your answer

Valentin Brosch 25 Reputation points

2023-05-30T17:10:23.5733333+00:00

@VasimTamboli Thanks for your answer. That should be possible for us. Could you please provide some instructions on how we can import the trained models into the containers?

Once we setup the container and link it to our azure subscription, can we just pass in the name/identifier of the model and is it downloaded automatically? Or is this a manual step? If the latter, where should we put the model and which image tag do we need to use?

Sorry for all the questions, but I did not find any mentioning of this use case in the official docs...
YutongTie-MSFT 53,971 Reputation points Moderator

2023-05-30T23:43:50.63+00:00

Hello @Valentin Brosch Are you concerned about the data privacy? In the document it mentions -** Cognitive Services containers do not send customer data (for example, the image or text that is being analyzed) to Microsoft.**

Please see the important notice under the summary in the document - https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/containers/form-recognizer-container-install-run?view=form-recog-3.0.0&tabs=read#summary

I hope this helps.

Regards, Yutong -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.
Valentin Brosch 25 Reputation points

2023-05-31T06:59:07.6733333+00:00

Hi @YutongTie-MSFT , thanks for your answer. Yes, we are very cautious as the data is highly protected by law and we must also comply with the GDPR. I know that the containers do not send any information (apart from billing) to the cloud (on inference), we are interested in how we can train custom models and run them inside the container.

Regards

Valentin

Answer 1

VasimTamboli 5,215

Training custom models using the Azure Form Recognizer containers without sending documents to the cloud is currently not supported. The training process requires access to the cloud-based Form Recognizer service to train and generate custom models.

However, you can still train your models in the cloud using synthetic data/forms without any sensitive information. Once you have trained the model in the cloud, you can download the model and use it in the connected containers for local inference on real and sensitive data. This way, you can maintain the privacy of your sensitive data while leveraging the trained model for OCR pipeline.

Keep in mind that you would need to comply with the licensing terms and conditions, as well as any regulatory requirements or data protection policies that apply to your sensitive data.

I hope this clarifies the possibilities for using Azure Form Recognizer in your OCR pipeline. If you have any further questions, feel free to ask.

Valentin Brosch 25 Reputation points

2023-05-31T07:05:19.1266667+00:00

Hi,

I don't want to push, but I just realized that I added my further questions as an answer instead of a comment to your post. I don't know if you get a notification of my post or not...

It would be excellent, if you could please tell me how I can import trained models from the cloud into the On-Prem Containers (c.f. my answer above).

But please take your time...

Regards

Valentin
VasimTamboli 5,215 Reputation points

2023-05-31T07:20:18.88+00:00
@Valentin Brosch

When using Azure Form Recognizer containers, you can import the trained models into the containers for local inference. Here's an overview of the process:

Train the model in the cloud: Using the cloud-based Form Recognizer service, you can train and generate custom models using synthetic data or forms without sensitive information. This process requires access to the cloud service, and you can follow the documentation on how to train a custom model using Azure Form Recognizer.

Download the trained model: Once the model is trained in the cloud, you can download it as a model file. The model file will be in the form of a pre-built Docker image (.docker) or a TensorFlow SavedModel (.zip). Download the appropriate model file based on your requirements.

Setup and link the container: Set up the Azure Form Recognizer container in your local environment and link it to your Azure subscription. Ensure that you have the necessary access and permissions to use the container with your subscription.

Import the trained model: In the container setup, there will be a specific location or directory where you can put the downloaded model file. This location can vary based on the container image you're using. Consult the documentation or instructions provided with the container image to determine the correct location.

Specify the model identifier: Once the model file is in the correct location within the container, you need to specify the model identifier when using the container for local inference. This identifier could be the name of the model or an identifier associated with the model. The exact method for specifying the model identifier can depend on the container image and the specific API or SDK you're using.

Local inference: With the trained model imported and the model identifier specified, you can now use the container for local inference on real and sensitive data. The container will utilize the downloaded model to perform OCR tasks without sending the documents to the cloud.

Here may be detailed steps

step-by-step guide on how to import trained models into Azure Form Recognizer containers for local inference:

Step 1: Train the model in the cloud using Azure Form Recognizer service

Follow the official Azure Form Recognizer documentation on how to train a custom model using the cloud-based service.

Use synthetic data or forms without sensitive information to train the model.

Ensure that you have the necessary access and permissions to train and generate custom models in the cloud.

Step 2: Download the trained model from Azure Form Recognizer

Once the model is trained in the cloud, download the model file.

The model file will be in the form of a pre-built Docker image (.docker) or a TensorFlow SavedModel (.zip), depending on your selection during training.

Step 3: Set up the Azure Form Recognizer container

Set up the Azure Form Recognizer container in your local environment or on-premises infrastructure.

Ensure that you have the necessary dependencies and resources to run the container.

Step 4: Link the container to your Azure subscription

Connect the Azure Form Recognizer container to your Azure subscription.

Follow the documentation provided by Azure to establish the link between your container and Azure.

Step 5: Import the trained model into the container

Determine the appropriate location or directory within the container where the model file needs to be placed.

Consult the documentation or instructions provided with the container image to find the correct location.

Copy the downloaded model file (Docker image or TensorFlow SavedModel) into the designated location within the container.

Step 6: Specify the model identifier in the container

Depending on the container image and the specific API or SDK you're using, you need to specify the model identifier.

The model identifier could be the name of the model or an identifier associated with the model.

Refer to the documentation or instructions provided with the container image to understand how to specify the model identifier correctly.

Step 7: Perform local inference using the container

With the trained model imported into the container and the model identifier specified, you can now use the container for local inference.

Utilize the container's OCR capabilities to process real and sensitive data locally, without sending the documents to the cloud.

Please note that the exact steps and instructions may vary depending on the container image and the specific implementation you are using. It's essential to refer to the documentation and instructions provided with the container image to get accurate and detailed guidance on each step.

If you have any further questions or need assistance with specific details, feel free to ask!
Valentin Brosch 25 Reputation points

2023-05-31T07:43:46.4633333+00:00

Thank you very much. That makes it much clearer. I will come back to you if I run into any issues.
Valentin Brosch 25 Reputation points

2023-05-31T07:46:12.51+00:00

Thank you! I think we can work this out...
John Antony 5 Reputation points

2023-08-08T17:39:57.72+00:00

How do we download the trained model .?Also whenever i train a new model i have to update the my local containers if i have to make use of new model right?
Michael 45 Reputation points

2023-08-21T11:10:39.7066667+00:00

May i ask how to download the models? Using the Form Recognizer Studio i can only see the option to copy the model but there's no download button.

Answer 2

@Valentin Brosch

When using Azure Form Recognizer containers, you can import the trained models into the containers for local inference. Here's an overview of the process:

Train the model in the cloud: Using the cloud-based Form Recognizer service, you can train and generate custom models using synthetic data or forms without sensitive information. This process requires access to the cloud service, and you can follow the documentation on how to train a custom model using Azure Form Recognizer.

Download the trained model: Once the model is trained in the cloud, you can download it as a model file. The model file will be in the form of a pre-built Docker image (.docker) or a TensorFlow SavedModel (.zip). Download the appropriate model file based on your requirements.

Setup and link the container: Set up the Azure Form Recognizer container in your local environment and link it to your Azure subscription. Ensure that you have the necessary access and permissions to use the container with your subscription.

Import the trained model: In the container setup, there will be a specific location or directory where you can put the downloaded model file. This location can vary based on the container image you're using. Consult the documentation or instructions provided with the container image to determine the correct location.

Specify the model identifier: Once the model file is in the correct location within the container, you need to specify the model identifier when using the container for local inference. This identifier could be the name of the model or an identifier associated with the model. The exact method for specifying the model identifier can depend on the container image and the specific API or SDK you're using.

Local inference: With the trained model imported and the model identifier specified, you can now use the container for local inference on real and sensitive data. The container will utilize the downloaded model to perform OCR tasks without sending the documents to the cloud.

Here may be detailed steps

step-by-step guide on how to import trained models into Azure Form Recognizer containers for local inference:

Step 1: Train the model in the cloud using Azure Form Recognizer service

Follow the official Azure Form Recognizer documentation on how to train a custom model using the cloud-based service.

Use synthetic data or forms without sensitive information to train the model.

Ensure that you have the necessary access and permissions to train and generate custom models in the cloud.

Step 2: Download the trained model from Azure Form Recognizer

Once the model is trained in the cloud, download the model file.

The model file will be in the form of a pre-built Docker image (.docker) or a TensorFlow SavedModel (.zip), depending on your selection during training.

Step 3: Set up the Azure Form Recognizer container

Set up the Azure Form Recognizer container in your local environment or on-premises infrastructure.

Ensure that you have the necessary dependencies and resources to run the container.

Step 4: Link the container to your Azure subscription

Connect the Azure Form Recognizer container to your Azure subscription.

Follow the documentation provided by Azure to establish the link between your container and Azure.

Step 5: Import the trained model into the container

Determine the appropriate location or directory within the container where the model file needs to be placed.

Consult the documentation or instructions provided with the container image to find the correct location.

Copy the downloaded model file (Docker image or TensorFlow SavedModel) into the designated location within the container.

Step 6: Specify the model identifier in the container

Depending on the container image and the specific API or SDK you're using, you need to specify the model identifier.

The model identifier could be the name of the model or an identifier associated with the model.

Refer to the documentation or instructions provided with the container image to understand how to specify the model identifier correctly.

Step 7: Perform local inference using the container

With the trained model imported into the container and the model identifier specified, you can now use the container for local inference.

Utilize the container's OCR capabilities to process real and sensitive data locally, without sending the documents to the cloud.

Please note that the exact steps and instructions may vary depending on the container image and the specific implementation you are using. It's essential to refer to the documentation and instructions provided with the container image to get accurate and detailed guidance on each step.

If you have any further questions or need assistance with specific details, feel free to ask!

Share via

Training and Using Custom Models in Azure Form Recognizer Containers

1 additional answer

Your answer