Training and Using Custom Models in Azure Form Recognizer Containers

Valentin Brosch 25 Reputation points
2023-05-30T13:33:44.7033333+00:00

Hi,

we are currently evaluating whether it would be possible to use Azure Form Recognizer for building an OCR pipeline on sensitive data. We have already applied for the connected container, however, I was not able to find any information whether it is possible to train custom models in the Docker containers.

I have two questions:

  1. Is it possible to train custom models using the Docker containers without sending any document to the cloud?
  2. If it is not possible to train them: Can we at-least use our cloud-trained models in the connected containers? This way, we can train our model on synthetic data/forms in the cloud (without PII data), "download" the model to the containers and run the inference on the real and sensitive data locally.

Thanks

Valentin

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,533 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,639 questions
{count} votes

Accepted answer
  1. VasimTamboli 4,785 Reputation points
    2023-05-30T15:08:05.38+00:00

    Training custom models using the Azure Form Recognizer containers without sending documents to the cloud is currently not supported. The training process requires access to the cloud-based Form Recognizer service to train and generate custom models.

    However, you can still train your models in the cloud using synthetic data/forms without any sensitive information. Once you have trained the model in the cloud, you can download the model and use it in the connected containers for local inference on real and sensitive data. This way, you can maintain the privacy of your sensitive data while leveraging the trained model for OCR pipeline.

    Keep in mind that you would need to comply with the licensing terms and conditions, as well as any regulatory requirements or data protection policies that apply to your sensitive data.

    I hope this clarifies the possibilities for using Azure Form Recognizer in your OCR pipeline. If you have any further questions, feel free to ask.

    2 people found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. VasimTamboli 4,785 Reputation points
    2023-05-31T07:23:46.73+00:00

    @Valentin Brosch

    When using Azure Form Recognizer containers, you can import the trained models into the containers for local inference. Here's an overview of the process:

    Train the model in the cloud: Using the cloud-based Form Recognizer service, you can train and generate custom models using synthetic data or forms without sensitive information. This process requires access to the cloud service, and you can follow the documentation on how to train a custom model using Azure Form Recognizer.

    Download the trained model: Once the model is trained in the cloud, you can download it as a model file. The model file will be in the form of a pre-built Docker image (.docker) or a TensorFlow SavedModel (.zip). Download the appropriate model file based on your requirements.

    Setup and link the container: Set up the Azure Form Recognizer container in your local environment and link it to your Azure subscription. Ensure that you have the necessary access and permissions to use the container with your subscription.

    Import the trained model: In the container setup, there will be a specific location or directory where you can put the downloaded model file. This location can vary based on the container image you're using. Consult the documentation or instructions provided with the container image to determine the correct location.

    Specify the model identifier: Once the model file is in the correct location within the container, you need to specify the model identifier when using the container for local inference. This identifier could be the name of the model or an identifier associated with the model. The exact method for specifying the model identifier can depend on the container image and the specific API or SDK you're using.

    1. Local inference: With the trained model imported and the model identifier specified, you can now use the container for local inference on real and sensitive data. The container will utilize the downloaded model to perform OCR tasks without sending the documents to the cloud.

    Here may be detailed steps

    step-by-step guide on how to import trained models into Azure Form Recognizer containers for local inference:

    Step 1: Train the model in the cloud using Azure Form Recognizer service

    Follow the official Azure Form Recognizer documentation on how to train a custom model using the cloud-based service.

    Use synthetic data or forms without sensitive information to train the model.

    Ensure that you have the necessary access and permissions to train and generate custom models in the cloud.

    Step 2: Download the trained model from Azure Form Recognizer

    Once the model is trained in the cloud, download the model file.

    The model file will be in the form of a pre-built Docker image (.docker) or a TensorFlow SavedModel (.zip), depending on your selection during training.

    Step 3: Set up the Azure Form Recognizer container

    Set up the Azure Form Recognizer container in your local environment or on-premises infrastructure.

    Ensure that you have the necessary dependencies and resources to run the container.

    Step 4: Link the container to your Azure subscription

    Connect the Azure Form Recognizer container to your Azure subscription.

    Follow the documentation provided by Azure to establish the link between your container and Azure.

    Step 5: Import the trained model into the container

    Determine the appropriate location or directory within the container where the model file needs to be placed.

    Consult the documentation or instructions provided with the container image to find the correct location.

    Copy the downloaded model file (Docker image or TensorFlow SavedModel) into the designated location within the container.

    Step 6: Specify the model identifier in the container

    Depending on the container image and the specific API or SDK you're using, you need to specify the model identifier.

    The model identifier could be the name of the model or an identifier associated with the model.

    Refer to the documentation or instructions provided with the container image to understand how to specify the model identifier correctly.

    Step 7: Perform local inference using the container

    With the trained model imported into the container and the model identifier specified, you can now use the container for local inference.

    • Utilize the container's OCR capabilities to process real and sensitive data locally, without sending the documents to the cloud.

    Please note that the exact steps and instructions may vary depending on the container image and the specific implementation you are using. It's essential to refer to the documentation and instructions provided with the container image to get accurate and detailed guidance on each step.

    If you have any further questions or need assistance with specific details, feel free to ask!

    0 comments No comments