Azure Document Intelligence: Is customer data used to train models?

LSB 20 Reputation points
2024-08-13T20:31:18.15+00:00

Does Microsoft train on customer data, specifically for Azure Document Intelligence?

I assume they do not, but I have not found an official statement.

Here's what I've found:

  • This page on data in Document Intelligence has a clause at the bottom I can't understand: in the section "deletes data" it says "and not used for any other purpose." But it's unclear what this means, and seems to be only after data is deleted.
  • This page on customer data has no mention of training models either way
  • This Q&A has a reply saying customer data isn't used for training, specifically for the Azure AI Content Safety product
  • This FAQ says customer data isn't used to "retrain" models, specifically for Azure OpenAI product

Can anyone help me find an official statement on Microsoft's policy about whether they train on customer data for the Azure Document Intelligence product?

For example, Google's statement in their competing product is clearly: "At Google Cloud, we never use customer data to train our Document AI models."

Many thanks for any help!

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,011 questions
{count} votes

Accepted answer
  1. navba-MSFT 27,500 Reputation points Microsoft Employee
    2024-08-14T02:43:24.6466667+00:00

    @LSB Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    .

    .

    As mentioned in our privacy public documentation, Data, privacy, and security for Document Intelligence - Azure AI services | Microsoft Learn , We can assure you that the customer that data uploaded is encrypted and are not used to any other purposes.

    .

    For the pre-built models, the customer data is not used.

    For customer trained models: For customer trained models, the customers can delete their models and associated metadata at any time by using the API. The interim outputs after analysis and labeling are stored in the same location. The trained custom models are stored in Azure storage in the same region and logically isolated with their Azure subscription and API credentials. More info:

    1. The reason we need the 24 hours, is this is a async API and you submit a document for analysis and request the results in a separate request.
    2. All the data is encrypted with a Microsoft managed key always and customers have the option to double encrypt with a customer managed key.
    3. Your data is your data and is not used to train or improve our models

    .

    .

     

    I have also asked the PG to update our public doc to make it more comprehensive, and they agreed to do so.

    .

    Our commitment to you:

    We’re transparent about the specific policies, operational practices, and technologies that help ensure the security, compliance, and privacy of your data across Microsoft services.

    1. You control your data.
    2. We're transparent about where data is located and how it’s used.
    3. We secure data at rest and in transit.
    4. We defend your data.

    More info ***here. *and here.

    .

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    **

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.