Prepare your data for Copilot searches


To prepare for Copilot, you must get your information ready for search. For example, if your organization already established the right information access controls and policies, then your users only have access to the information that they need and nothing else as they search in places like SharePoint. So if your organization already implemented these types of controls, then you're one step ahead.

If not, the good news is there are tools and controls that you can use to get visibility into how your organization shares information. You can put automated controls in place to ensure the right level of access and stop oversharing before you roll out Copilot for Microsoft 365. So just as you would prepare the information in your Microsoft 365 tenant for search, the same principles apply for Copilot, because Copilot only retrieves information each user explicitly has access to.

Diagram showing a robot looking at servers filled with data.

Data preparation tips

Organizations that improve their Microsoft 365 data quality and organization enable Copilot to generate more accurate, relevant suggestions tailored to their business needs. Administrators should consider implementing the following best practices to improve their organization's data quality:

  • Clean out redundant, outdated, and trivial (ROT) content. Perform an extensive audit of all organizational content including documents, emails, chats, wikis. Remove any outdated materials that are no longer accurate or relevant. For example, delete old product spec sheets from 5+ years ago, promotional emails for expired campaigns, and resolved IT ticket conversations. Enable archive/deletion policies on collaborative platforms to automate removal of stale content. Ongoing removal of obsolete, irrelevant content focuses Copilot on current, high-value information. The use of retention policies and retention labels can help organizations comply proactively with industry regulations and internal policies, reduce risk if a litigation event or a security breach occurs, and ensure users only work with content that's current and relevant to them. For more information, see Learn about retention policies and retention labels.
  • Organize content into logical folders and sites. Structure your files, emails and pages thoughtfully. Design a detailed taxonomy for categorizing and organizing documents, emails, and pages. For example, within document libraries, place financial reports under the "Finance" folder and marketing assets under "Marketing." Within SharePoint, create sites for "HR Policies," "IT Resources," "Product Documentation," and so on. Add metadata like client names and project codes on files to further identify them. When organizations design a logical information architecture in this manner, it enables Copilot to infer relationships and relevance.
  • Tag files with keywords. Make extensive use of labels, hashtags, metadata tags on all documents, emails, and pages to describe characteristics. For example, label customer support tickets with #refund, #payment-issue, or other issue tags. Add product attributes like model number, market, and manufacturing year as metadata tags on images and spec sheets. Thorough labeling and tagging allows Copilot to rapidly categorize, search, and recommend content.
  • Standardize file names. Mandate consistent file naming conventions like "Q3 2023 Earnings Report" instead of abbreviation-filled names. Set up recommended templates for documents and presentations. Organizations that use consistent, descriptive names rather than abbreviations enable Copilot to better grasp content.
  • Consolidate multiple versions. Wherever feasible, retain only the final version of documents, presentations, and so on. Find old iterations of files and consolidate to only retain the most current version. The final version should clearly indicate it's the latest. Eliminating redundant drafts and outdated versions reduces confusion and contradictions for Copilot.
  • Promote data hygiene habits. Implement organization-wide training and change management to promote good data hygiene habits among employees. Provide guidelines on effectively naming files, tagging content, retaining only current versions, deleting stale emails and content, and other practices. Consider gamification by recognizing top contributors to data hygiene. Organizations should also monitor data usage consent. If any data sets include personal information, ensure employees provide proper consent for use in Copilot. Build data quality expectations into employee goal setting and reviews. Organizations that establish a culture focused on maintaining clean, well-organized data ensure high quality over time, which maximizes Copilot's effectiveness.

Data governance considerations

When administrators prepare data for Copilot, they should also consider the following data governance factors:

  • Assign a data steward to oversee preparation and continue maintaining quality. Organizations have sensitive information under their control such as financial data, proprietary data, credit card numbers, health records, or social security numbers. As such, they should consider designating an experienced data governance expert or a team of governance experts as the official data steward for Microsoft 365. This person should be responsible for auditing data, establishing access rules, training users on hygiene, continuously monitoring how Microsoft 365 and Copilot utilize organizational data, and enacting improvements. Having an accountable data steward helps ingrain excellent data habits and promotes accountability across the AI lifecycle. To help protect their sensitive data and reduce the risk from oversharing, organizations must prevent their users from inappropriately sharing sensitive data with people who shouldn't have it. They can accomplish this goal by implementing sensitivity labels and data loss protection (DLP) policies. For more information, see Learn about data loss prevention and Learn about sensitivity labels.

  • Document your data policies and practices related to Microsoft 365 and Copilot utilization. As a best practice, organizations should formally outline their data management policies, access rules, use cases, and procedures related to data security and governance in Microsoft 365. As previously stated, the powerful security tools within the Microsoft 365 and Azure ecosystems can help organizations tighten permissions and implement "just enough access." The policies and settings that administrators define in these tools are used not only by Microsoft 365 to prevent data oversharing, but also by Copilot for Microsoft 365. Organizations that don't document their data governance policies should consider doing so prior to implementing Copilot. Drafting a comprehensive data governance policy should codify rules for:

    • Restricted data
    • Anonymization procedures
    • Stewardship roles
    • Employee training requirements
    • Access authorization procedures
    • Monitoring practices
    • Other enforceable policies

    An organization should share its data governance policy across the entire company and regularly update it. For example, it can create a data governance policy that defines the confidential data restricted from user access, requires anonymization of certain datasets, and designates a data governance expert or team to continually oversee its Microsoft 365 data practices. Formalizing governance requirements in this manner helps create accountability across an organization.

Robust governance is crucial to ensure that both Microsoft 365 and Copilot comply with legal and ethical data standards. As a best practice, organizations should appoint cross-functional data, security, and compliance teams to enact data restrictions, anonymization, stewardship, policies, and training.

Organizations should also keep in mind that while initial audits, access restrictions, and governance policies are crucial when first deploying Microsoft 365 and Copilot, they should view data governance as an iterative, continuous process. Why? Because data assets and usage patterns inevitably evolve over time. For example:

  • Organizations add new data repositories as business needs change. New repositories require auditing and proper access controls put in place.
  • Users and permissions change. In business, change is a constant occurrence, especially as employees come and go. For example, new employees join or existing employees change roles. Administrators should grant/revoke access accordingly.
  • Regulations and compliance requirements change. Data policies must reflect any new restrictions.

To keep pace, organizations should perform regular reviews and updates, such as:

  • Monthly audits of new data sources that can require access changes.
  • Quarterly scans of permissions and external sharing to identify any new overexposure.
  • Annual policy reviews to update for new regulations and refresh employee training.

Appointing a data governance expert or governance team to oversee this continuous process helps ingrain it as a living, evolving set of data policies and access controls. This process also enables adapting Microsoft 365 and Copilot governance to changes over time.