Share via

Does Microsoft use my email or my OneDrive files to train Copilot or any other LLM or AI?

Anonymous
2024-03-05T06:32:57+00:00

Now that Microsoft is in the Large Language Model business, and is actively marketing Copilot, I would like to know whether my data is being used to train your LLM. If your customers use Exchange-hosted email and host data on OneDrive, is that data used to train Copilot, OpenAI or any other AI or LLM?

I am required by professional ethics to keep my clients' data and confidences secret. I have seen enough stories about LLMs regurgitating training data, that I am very concerned.

If Microsoft does not use customer data to train LLMs or AIs, where do your terms of service say that? I want to know for sure.

Thank you.

Microsoft 365 and Office | Subscription, account, billing | For business | Other

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments No comments

8 answers

Sort by: Most helpful
  1. Anonymous
    2024-03-15T15:59:01+00:00

    Hi, dodo babelfish

    Good day.

    Thank you for posting to the Microsoft Community. We are glad to assist you.

    According to your description, please be kindly to understand that Microsoft takes the privacy and security of our customers' data very seriously. We do not use customer data to train Copilot or any other LLM or AI.

    Our training data is sourced from publicly available sources and is carefully curated to ensure that it does not contain any customer data or confidential information.

    Our terms of service clearly state that we do not use customer data for training our AI models.

    If you have any further concerns or questions, please do not hesitate to contact our customer support team -Contact Us - Microsoft Support.

    Thanks for your precious time and your understanding would be highly appreciated.

    Hope you all the best!

    Microsoft Community Moderator.

    Where do your terms of Service state that you do not use customer data to train your LLM/AI? I read your ToS and the section on AI has no mention of the types of data used to train your models. Can you please point me in the direction of the specific pages that say that? Thank you.

    Was this answer helpful?

    20+ people found this answer helpful.
    0 comments No comments
  2. Anonymous
    2024-04-12T01:56:35+00:00

    I moved all of my files out of the cloud and suggest you do the same. Personal decision. Companies are more "data hungry" than ever before. Back to the old external SSD days.

    Was this answer helpful?

    10+ people found this answer helpful.
    0 comments No comments
  3. Anonymous
    2024-03-15T17:33:15+00:00

    Thank you very much for your very thoughtful response, which gets to the key issues here. Your response was much more helpful than the official response by AlexChenMSFT, which was just corporate boilerplate.

    The data privacy issues connected with Copilot and other LLMs are serious for many professionals who use computers. Medical personnel have medical privacy rules. Lawyers are obligated to protect their clients' confidential data. Business people are bound by NDAs or other duties to protect company data.

    All of these users can lose their jobs and/or lose their licenses if data leaks. For them, "trust us" is not enough because they may be legally obligated to do due diligence on the handling of their data. If there isn't something in writing, they may not be able to use Copilot at all. Or their companies or professional services firms may prohibit them from using Copilot.

    That's why I was looking for information on Microsoft's *written* commitment to protect client data stored via MS 365 (for instance, client data on OneDrive or in Exchange-hosted email). I still want to know where this appears in writing in the terms of service for Microsoft 365 or other Microsoft services.

    Thanks again,

    dodo babelfish

    Was this answer helpful?

    10+ people found this answer helpful.
    0 comments No comments
  4. Anonymous
    2024-06-16T08:30:17+00:00

    I have been noticing OneDrive randomly downloading my programming files to a folder which I haven't accessed in a while. I don't know how that's happening, since I just don't use that folder anymore. It should only be backed up to the cloud.

    This is exactly why I was wondering if Microsoft is doing something shady. User privacy means nothing to Microsoft, and saying so doesn't change it. It would be really great if a NON Microsoft employee who has technical knowledge responds to this post.

    Thanks in advance!

    Was this answer helpful?

    7 people found this answer helpful.
    0 comments No comments
  5. Vincent Choy 10,855 Reputation points Volunteer Moderator
    2024-03-15T16:54:29+00:00

    Specifically for Copilot for Microsoft365, many of us working with Microsoft stuff has just returned from the Microsoft MVP Summit in Redmond. The key conversations between the independent MVPs and Microsoft product teams are centered on data privacy, not just training of the LLM models, but also data privacy within an organization where only the right person should see the right data.

    Enterprise customers have been testing, using and giving Microsoft early feedback since its rollout in Nov 2023 Your expressed concerns are equally important to these customers as well

    This may help on statements about training of LLMs.

    https://learn.microsoft.com/en-us/microsoft-365-copilot/microsoft-365-copilot-privacy

    Microsoft product teams have repeatedly stressed -not only does it not use customer data to train AI, it will only surface content that the user has access to in the first place.

    The design philosophy of Copilot for M365 seem to reflect this fact. You can study deeper on how it works from this blog from practical365 which mentions -

    “Microsoft isn’t fine-tuning an LLM complex that is expensive and may not respect permissions. Instead, they do something known as grounding, giving the LLM (GPT-4) accurate information it needs, alongside your prompt, so that the LLM’s purpose is to form words and sentences based on your data, rather than based on the data it was trained against.”

    📸 Look at this post on Facebook https://practical365.com/how-copilot-for-microsoft-365-works/?fbclid=IwAR0NLToX6w6ceEmLACx0aG8IT1bMacHaEd-55KQXCTLr9Llxk8_9QQf34jM_aem_AboPffBRWv3t0oHJ-_JKZcmfS4Adxe4M7wNw8315VEsdNMDeo4V7JoDuFBolCQ__6vo

    Note that I don’t work for Microsoft.

    However the general consensus from my peers of Microsoft MVPs is that we trust what Microsoft is saying at this point. You would have to decide if this is a sufficient level of proof, or seek further proof that you can be satisfied with.

    Was this answer helpful?

    6 people found this answer helpful.
    0 comments No comments