Finding content in sites in eDiscovery (preview)

When searching for documents and files located in SharePoint or OneDrive sites, it may make sense to adjust the query approach based on the metadata for the documents and files of interest. Files and documents have relevant properties like Author, Created, CreatedBy, FileName, LastModifiedTime, and Title. Most of these proprieties aren't relevant when searching for communications content in Exchange Online, and using these properties may lead to unexpected results if used across both documents and communications. Additionally, FileName and Title of a document may not be the same and using one or the other to try to find a file with specific content may lead to different or inaccurate results. Keep these properties in mind when searching for specific document and file content in SharePoint and OneDrive.

For example, to find content related to documents created by User 1, for a project called Tradewinds, for specific files named Financials, and from January 2020 to January 2022, you might use a query with the following properties:

  • Add User 1 as a data source to the search.
  • Select User 1's OneDrive site as the location of interest.
  • Add additional groups and their associated SharePoint sites related to the project as data sources.
  • For FileName, use Financials
  • For Keyword, use Tradewinds
  • For Date Range, use the January 1, 2020 to January 31, 2022 range

Tip

Get started with Microsoft Copilot for Security to explore new ways to work smarter and faster using the power of AI. Learn more about Microsoft Copilot for Security in Microsoft Purview.

Searchable site properties

The following table lists the SharePoint and OneDrive properties that can be searched by using the eDiscovery search tools in the Microsoft Purview portal or by using the New-ComplianceSearch or the Set-ComplianceSearch cmdlet.

Important

While documents and files stored on SharePoint and OneDrive may have other properties supported in other Microsoft 365 services, only the document and file properties listed in this table are supported in eDiscovery search tools. Attempting to include other document or file properties in searches isn't supported.

The table includes an example of the property:value syntax for each property and a description of the search results returned by the examples.

Property Property description Example Search results returned by the examples
Author The author field from Office documents, which persists if a document is copied. For example, if a user creates a document and the emails it to someone else who then uploads it to SharePoint, the document will still retain the original author. Be sure to use the user's display name for this property. author:"Garth Fort" All documents that are authored by Garth Fort.
ContentType The SharePoint content type of an item, such as Item, Document, or Video. contenttype:document All documents would be returned.
Created The date that an item is created. created>=2021-06-01 All items created on or after June 1, 2021.
CreatedBy The person that created or uploaded an item. Be sure to use the user's display name for this property. createdby:"Garth Fort" All items created or uploaded by Garth Fort.
DetectedLanguage The language of an item. detectedlanguage:english All items in English.
DocumentLink The path (URL) of a specific folder on a SharePoint or OneDrive site. If you use this property, be sure to search the site that the specified folder is located in. We recommend using this property instead of the Site and Path properties.

To return items located in subfolders of the folder that you specify for the documentlink property, you have to add /* to the URL of the specified folder; for example, documentlink: "https://contoso.sharepoint.com/Shared Documents/*"


For more information about searching for the documentlink property and using a script to obtain the documentlink URLs for folders on a specific site, see Search for targeted searches.

documentlink:"https://contoso-my.sharepoint.com/personal/garthf_contoso_com/Documents/Private"

documentlink:"https://contoso-my.sharepoint.com/personal/garthf_contoso_com/Documents/Shared with Everyone/*" AND filename:confidential

The first example returns all items in the specified OneDrive folder. The second example returns documents in the specified site folder (and all subfolders) that contain the word "confidential" in the file name.
FileExtension The extension of a file; for example, docx, one, pptx, or xlsx. fileextension:xlsx All Excel files (Excel 2007 and later)
FileName The name of a file. filename:"marketing plan"

filename:estimate

The first example returns files with the exact phrase "marketing plan" in the title. The second example returns files with the word "estimate" in the file name.
LastModifiedTime The date that an item was last changed. lastmodifiedtime>=2021-05-01

lastmodifiedtime>=2021-05-01 AND lastmodifiedtime<=2021-06-01

The first example returns items that were changed on or after May 1, 2021. The second example returns items changed between May 1, 2021 and June 1, 2021.
ModifiedBy The person who last changed an item. Be sure to use the user's display name for this property. modifiedby:"Garth Fort" All items that were last changed by Garth Fort.
SharedWithUsersOWSUser Documents that have been shared with the specified user and displayed on the Shared with me page in the user's OneDrive site. These are documents that have been explicitly shared with the specified user by other people in your organization. When you export documents that match a search query that uses the SharedWithUsersOWSUser property, the documents are exported from the original content location of the person who shared the document with the specified user. For more information, see Searching for site content shared within your organization. sharedwithusersowsuser:garthf

sharedwithusersowsuser:"garthf@contoso.com"

Both examples return all internal documents that have been explicitly shared with Garth Fort and that appear on the Shared with me page in Garth Fort's OneDrive account.
Size The size of an item, in bytes. size>=1

size:1..10000

The first example returns items larger than 1 byte. The second example returns items from 1 through 10,000 bytes in size.
Title The title of the document. The Title property is metadata that's specified in Microsoft Office documents. It's different from the file name of the document. title:"communication plan" Any document that contains the phrase "communication plan" in the Title metadata property of an Office document.

Searchable sensitive data types

You can use eDiscovery search tools in the Microsoft Purview portal to search for sensitive data, such as credit card numbers or social security numbers, stored in documents on SharePoint and OneDrive sites. You can do this by using the SensitiveType property and the name (or ID) of a sensitive information type in a keyword query. For example, the query SensitiveType:"Credit Card Number" returns documents that contain a credit card number. The query SensitiveType:"U.S. Social Security Number (SSN)" returns documents that contain a U.S. social security number.

To see a list of the sensitive information types that you can search for, go to Data classifications > Sensitive info types in the Microsoft Purview portal. Or you can use the Get-DlpSensitiveInformationType cmdlet in Security & Compliance PowerShell to display a list of sensitive information types.

Limitations for searching sensitive data types

  • To search for custom sensitive information types, you have to specify the ID of the sensitive information type in the SensitiveType property. Using the name of a custom sensitive information type (as shown in the example for built-in sensitive information types in the previous section) returns no results. Use the Publisher column on the Sensitive info types page in the Microsoft Purview portal (or the Publisher property in PowerShell) to differentiate between built-in and custom sensitive information types. Built-in sensitive data types have a value of Microsoft Corporation for the Publisher property.

    To display the name and ID for the custom sensitive data types in your organization, run the following command in Security & Compliance PowerShell:

    Get-DlpSensitiveInformationType | Where-Object {$_.Publisher -ne "Microsoft Corporation"} | FT Name,Id
    

    Then you can use the ID in the SensitiveType search property to return documents that contain the custom sensitive data type; for example, SensitiveType:7e13277e-6b04-3b68-94ed-1aeb9d47de37

  • You can't use sensitive information types and the SensitiveType search property to search for sensitive data at-rest in Exchange Online mailboxes. This includes 1:1 chat messages, 1:N group chat messages, and team channel conversations in Microsoft Teams because all of this content is stored in mailboxes. However, you can use data loss prevention (DLP) policies to protect sensitive email data in transit. For more information, see Learn about data loss prevention and Search for and find personal data.

Forming a basic query

There are three parts that make up a basic query: SensitiveType, count range, and confidence range. For example, SensitiveType:"<type>" is required, and both |<count range> and |<confidence range> are optional.

Sensitive type - required

Queries typically begin with the property SensitiveType:" and an information type name from the sensitive information types inventory, and end with a ". You can also use the name of a custom sensitive information type that you created for your organization. For example, you might be looking for documents that contain credit card numbers.

In such an instance, you'd use the following format: SensitiveType:"Credit Card Number". Because you didn't include count range or confidence range, the query returns every document in which a credit card number is detected. This is the simplest query that you can run, and it returns the most results. Keep in mind that the spelling and spacing of the sensitive type matters.

Ranges - optional

Both of the next two parts are ranges, so let's quickly examine what a range looks like. In SharePoint queries, a basic range is represented by two numbers separated by two periods, which looks like this: [number]..[number]. For instance, if 10..20 is used, that range would capture numbers from 10 through 20. There are many different range combinations and several are covered in this article.

Let's add a count range to the query. You can use count range to define the number of occurrences of sensitive information a document needs to contain before it's included in the query results. For example, if you want your query to return only documents that contain exactly five credit card numbers, use this: SensitiveType:"Credit Card Number|5". Count range can also help you identify documents that pose high degrees of risk. For example, your organization might consider documents with five or more credit card numbers a high risk. To find documents fitting this criterion, you would use this query: SensitiveType:"Credit Card Number|5..". Alternatively, you can find documents with five or fewer credit card numbers by using this query: SensitiveType:"Credit Card Number|..5".

Confidence range

Finally, confidence range is the level of confidence that the detected sensitive type is actually a match. The values for confidence range work similarly to count range. You can form a query without including a count range. For example, to search for documents with any number of credit card numbers—as long as the confidence range is 85 percent or higher—you would use this query: SensitiveType:"Credit Card Number|*|85..".

Important

The asterisk ( * ) is a wildcard character that means any value works. You can use the wildcard character ( * ) either in the count range or in the confidence range, but not in a sensitive type.

Additional query properties and search operators

Queries in SharePoint also contain the LastSensitiveContentScan property, which can help you search for files scanned within a specific timeframe. For query examples with the LastSensitiveContentScan property, see the Examples of complex queries in the next section.

You can use SharePoint eDiscovery search properties such as Author or FileExtension. You can use operators to build complex queries. For the list of available properties and operators, see the Using Search Properties and Operators with eDiscovery blog post.

Examples of complex queries

The following examples use different sensitive types, properties, and operators to illustrate how you can refine your queries to find exactly what you're looking for.

Query Explanation
SensitiveType:"International Banking Account Number (IBAN)" The name might seem strange because it's so long, but it's the correct name for that sensitive type. Make sure to use exact names from the sensitive information types inventory. You can also use the name of a custom sensitive information type that you created for your organization.
SensitiveType:"Credit Card Number|1..4294967295|1..100" This returns documents with at least one match to the sensitive type "Credit Card Number." The values for each range are the respective minimum and maximum values. A simpler way to write this query is SensitiveType:"Credit Card Number", but where's the fun in that?
SensitiveType:"Credit Card Number|5..25" AND LastSensitiveContentScan:"8/11/2018..8/13/2018" This returns documents with 5-25 credit card numbers that were scanned from August 11, 2018 through August 13, 2018.
SensitiveType:"Credit Card Number|5..25" AND LastSensitiveContentScan:"8/11/2018..8/13/2018" NOT FileExtension:XLSX This returns documents with 5-25 credit card numbers that were scanned from August 11, 2018 through August 13, 2018. Files with an XLSX extension aren't included in the query results. FileExtension is one of many properties that you can include in a query. For more information, see Using Search Properties and Operators with eDiscovery.
SensitiveType:"Credit Card Number" OR SensitiveType:"U.S. Social Security Number (SSN)" This returns documents that contain either a credit card number or a social security number.

Examples of queries to avoid

Not all queries are created equal. The following table gives examples of queries that don't work in SharePoint and describes why.

Unsupported query Reason
SensitiveType:"Credit Card Number|.." You must add at least one number.
SensitiveType:"NotARule" "NotARule" isn't a valid sensitive type name. Only names in the sensitive information types inventory work in eDiscovery queries.
SensitiveType:"Credit Card Number|0" Zero isn't valid as either the minimum value or the maximum value in a range.
SensitiveType:"Credit Card Number" It's might be difficult to see, but there's extra white space between "Credit" and "Card" that makes the query invalid. Use exact sensitive type names from the sensitive information types inventory.
SensitiveType:"Credit Card Number|1. .3" The two-period portion shouldn't be separated by a space.
SensitiveType:"Credit Card Number| |1..|80.." There are too many pipe delimiters (|). Follow this format instead: SensitiveType: "Credit Card Number|1..|80.."
SensitiveType:"Credit Card Number|1..|80..101" Because confidence values represent a percentage, they can't exceed 100. Choose a number from 1 through 100 instead.

Searching for site content shared with external users

You can also use eDiscovery search tools in the Microsoft Purview portal to search for documents stored on SharePoint and OneDrive sites that have been shared with people outside of your organization. This can help you identify sensitive or proprietary information that's being shared outside your organization. You can do this by using the ViewableByExternalUsers property in a keyword query. This property returns documents or sites that have been shared with external users by using one of the following sharing methods:

  • A sharing invitation that requires users to sign in to your organization as an authenticated user.
  • An anonymous guest link, which allows anyone with this link to access the resource without having to be authenticated.

Here are some examples:

  • The query ViewableByExternalUsers:true AND SensitiveType:"Credit Card Number" returns all items that have been shared with people outside your organization and contain a credit card number.
  • The query ViewableByExternalUsers:true AND ContentType:document AND site:"https://contoso.sharepoint.com/Sites/Teams" returns a list of documents on all team sites in the organization that have been shared with external users.

Tip

A search query such as ViewableByExternalUsers:true AND ContentType:document might return a lot of .aspx files in the search results. To eliminate these (or other types of files), you can use the FileExtension property to exclude specific file types; for example ViewableByExternalUsers:true AND ContentType:document NOT FileExtension:aspx.

What is considered content that is shared with people outside your organization? Documents in your organization's SharePoint and OneDrive sites that are shared by sending a sharing invitation or that are shared in public locations. For example, the following user activities result in content that is viewable by external users:

  • A user shares a file or folder with a person outside your organization.
  • A user creates and sends a link to a shared file to a person outside your organization. This link allows the external user to view (or edit) the file.
  • A user sends a sharing invitation or a guest link to a person outside your organization to view (or edit) a shared file.

Issues using the ViewableByExternalUsers property

While the ViewableByExternalUsers property represents the status of whether a document or site is shared with external users, there are some caveats to what this property does and doesn't reflect. In the following scenarios, the value of the ViewableByExternalUsers property won't be updated, and the results of a search query that uses this property may be inaccurate.

  • Changes to sharing policy, such as turning off external sharing for a site or for the organization. The property will still show previously shared documents as being externally accessible even though external access might have been revoked.
  • Changes to group membership, such as adding or removing external users to Microsoft 365 Groups or Microsoft 365 security groups. The property won't automatically be updated for items the group has access to.
  • Sending sharing invitations to external users where the recipient hasn't accepted the invitation, and therefore doesn't yet have access to the content.

In these scenarios, the ViewableByExternalUsers property won't reflect the current sharing status until the site or document library is recrawled and reindexed.

Searching for site content shared within your organization

You can use the SharedWithUsersOWSUser property so search for documents that have been shared between people in your organization. When a person shares a file (or folder) with another user inside your organization, a link to the shared file appears on the Shared with me page in the OneDrive account of the person who the file was shared with. For example, to search for the documents that have been shared with Sara Davis, you can use the query SharedWithUsersOWSUser:"sarad@contoso.com". If you export the results of this search, the original documents (located in the content location of the person who shared the documents with Sara) are downloaded.

Documents must be explicitly shared with a specific user to be returned in search results when using the SharedWithUsersOWSUser property. For example, when a person shares a document in their OneDrive account, they have the option to share it with anyone (inside or outside the organization), share it only with people inside the organization, or share it with a specific person.

Only documents that are shared by using the third option (shared with Specific people) are returned by a search query that uses the SharedWithUsersOWSUser property.

Searching for Skype for Business conversations

You can use the following keyword query to specifically search for content in Skype for Business conversations:

kind:im

The previous search query also returns chats from Microsoft Teams. To prevent this, you can narrow the search results to include only Skype for Business conversations by using the following keyword query:

kind:im AND subject:conversation

The previous keyword query excludes chats in Microsoft Teams because Skype for Business conversations are saved as email messages with a Subject line that starts with the word "Conversation".

To search for Skype for Business conversations that occurred within a specific date range, use the following keyword query:

kind:im AND subject:conversation AND (received=startdate..enddate)

Character limits for searches

For more information about character limits, see eDiscovery search limits.