This article describes how to configure and install the Microsoft Purview Information Protection scanner, formerly named Azure Information Protection unified labeling scanner, or the on-premises scanner.
Tips
While most customers will perform these procedures in the admin portal, you may need to work in PowerShell only.
If you don't have access to the scanner pages in the Microsoft Purview portal or the Microsoft Purview compliance portal, configure any scanner settings in PowerShell only. For more information, see Use PowerShell to configure the scanner and Supported PowerShell cmdlets.
Configure the scanner settings
Before you install the scanner, or upgrade it from an older general availability version, configure or verify your scanner settings. For this configuration, you can use either the Microsoft Purview portal or the Microsoft Purview compliance portal.
To configure your scanner in the Microsoft Purview portal or Microsoft Purview compliance portal:
Sign in using one of the following roles:
Compliance Administrator
Compliance Data Administrator
Security Administrator
Organization Management
Depending on the portal you're using, navigate to one of the following locations:
Create a scanner cluster. This cluster defines your scanner and is used to identify the scanner instance, such as during installation, upgrades, and other processes.
To create a scanner cluster in the Microsoft Purview portal or Microsoft Purview compliance portal:
From the tabs on the Information protection scanner page, select Clusters.
On the Clusters tab, select Add.
On the New cluster pane, enter a meaningful name for the scanner, and an optional description.
The cluster name is used to identify the scanner's configurations and repositories. For example, you might enter Europe to identify the geographical locations of the data repositories you want to scan.
You'll use this name later on to identify where you want to install or upgrade your scanner.
Select Save to save your changes.
Create a content scan job
Deep dive into your content to scan specific repositories for sensitive content.
To create your content scan job on the Microsoft Purview portal of Microsoft Purview compliance portal:
From the tabs on the Information protection scanner page, select Content scan jobs.
On the Content scan jobs pane, select Add.
For this initial configuration, configure the following settings, and then select Save.
Setting
Description
Content scan job settings
- Schedule: Keep the default of Manual - Info types to be discovered: Change to Policy only
DLP policy
If you're using a data loss prevention policy, set Enable DLP rules to On. For more information, see Use a DLP policy.
Sensitivity policy
- Enforce sensitivity labeling policy: Select Off - Label files based on content: Keep the default of On - Default label: Keep the default of Policy default - Relabel files: Keep the default of Off
Configure file settings
- Preserve "Date modified", "Last modified" and "Modified by": Keep the default of On - File types to scan: Keep the default file types for Exclude - Default owner: Keep the default of Scanner Account - Set repository owner: Use this option only when using a DLP policy.
Open the content scan job that was saved, and select the Repositories tab to specify the data stores to be scanned.
Specify UNC paths and SharePoint Server URLs for SharePoint on-premises document libraries and folders.
Obs!
SharePoint Server 2019, SharePoint Server 2016, and SharePoint Server 2013 are supported for SharePoint. SharePoint Server 2010 is also supported when you have extended support for this version of SharePoint.
To add your first data store, while on the Repositories tab:
On the Repositories pane, select Add:
On the Repository pane, specify the path for the data repository, and then select Save.
For a network share, use \\Server\Folder.
For a SharePoint library, use http://sharepoint.contoso.com/Shared%20Documents/Folder.
For a local path: C:\Folder
For a UNC path: \\Server\Folder
Obs!
Wildcards are not supported and WebDav locations are not supported.
Scanning of OneDrive locations as repositories is not supported.
If you add a SharePoint path for Shared Documents:
Specify Shared Documents in the path when you want to scan all documents and all folders from Shared Documents.
For example: http://sp2013/SharedDocuments
Specify Documents in the path when you want to scan all documents and all folders from a subfolder under Shared Documents.
For example: http://sp2013/Documents/SalesReports
For the remaining settings on this pane, don't change them for this initial configuration, but keep them as Content scan job default. The default setting means that the data repository inherits the settings from the content scan job.
Use the following syntax when adding SharePoint paths:
Path
Syntax
Root path
http://<SharePoint server name>
Scans all sites, including any site collections allowed for the scanner user. Requires additional permissions to automatically discover root content
Specific SharePoint subsite or collection
One of the following: - http://<SharePoint server name>/<subsite name> - http://SharePoint server name>/<site collection name>/<site name>
One of the following: - http://<SharePoint server name>/<library name> - http://SharePoint server name>/.../<library name>
Specific SharePoint folder
http://<SharePoint server name>/.../<folder name>
Repeat the previous steps to add as many repositories as needed.
You're now ready to install the scanner with the content scanner job that you've created. Continue with Install the scanner.
Install the scanner
After you've configured the scanner, perform the following steps to install the scanner. This procedure is performed fully in PowerShell.
Sign in to the Windows Server computer that will run the scanner. Use an account that has local administrator rights and that has permissions to write to the SQL Server master database.
Viktig
You must have the information protection client installed on your machine before installing the scanner.
Open a Windows PowerShell session with the Run as an administrator option.
Run the Install-Scanner cmdlet, specifying your SQL Server instance on which to create a database for the information protection scanner, and the scanner cluster name that you specified in the preceding section:
Examples, using the scanner cluster name of Europe:
For a default instance: Install-Scanner -SqlServerInstance SQLSERVER1 -Cluster Europe
For a named instance: Install-Scanner -SqlServerInstance SQLSERVER1\SCANNER -Cluster Europe
For SQL Server Express: Install-Scanner -SqlServerInstance SQLSERVER1\SQLEXPRESS -Cluster Europe
When you're prompted, provide the Active Directory credentials for the scanner service account.
Use the following syntax: \<domain\user name>. For example: contoso\scanneraccount
Verify that the service is now installed by using Administrative Tools > Services.
The installed service is named Microsoft Purview Information Protection Scanner and is configured to run by using the scanner service account that you created.
A Microsoft Entra token allows the scanner to authenticate to the Microsoft Purview Information Protection Scanner service, enabling the scanner to run unattended.
Navigate to the Azure Portal and proceed to the Microsoft Entra ID Blade.
In the Microsoft Entra ID side pane, click App Registrations.
At the top, go ahead and click + New registration.
In the Name section type in InformationProtectionScanner.
Leave Supported account types as default.
For the Redirect URI, leave the type as Web but type in http://localhost for the entry portion and click Register.
On the Overview page of this application, note down in your text editor of choice the following IDs: Application (client) ID and Directory (tenant) ID. You will need this later when setting up the Set-AIPAuthentication command.
On the side pane, navigate to Certificates and Secrets.
Click on + New client secret.
In the dialog box that shows up, enter a description for your secret and set it to Expire In 1 year and then Add the secret.
You should see now under the client secrets section that there is an entry with the Secret Value. Go ahead and copy this value and store it in the file where you saved the Client ID and Tenant ID. This is the only time you will be able to see the secret value, it will not be recoverable if you don't copy it at this time.
On the side pane, navigate to API Permissions.
Go ahead and select Add a permission.
When the screen shows, select Azure Rights Management Service. Then select Application Permissions.
Click the drop down for Content and put checkmarks down for Content.DelegatedReader and Content.DelegatedWriter. Then at the bottom of the screen, click Add Permissions.
Navigate back the API Permissions section and add another permission.
This time, for the Select an API section, click on APIs my organization uses. In the search bar, type in Microsoft Information Protection Sync Service and select it.
Select Application Permissions and then in the Unified Policy drop down, checkmark the permission UnifiedPolicy.Tenant.Read. Then at the bottom of the screen, click Add Permissions.
Back on the API Permissions screen, click Grant Admin Consent and look for the operation being successful (signified by a green checkmark).
From the Windows Server computer, if your scanner service account has been granted the Log on locally right for the installation, sign in with this account and start a PowerShell session.
Run Set-Authentication, specifying the values that you copied from the previous step:
PowerShell
Set-Authentication -AppId <ID of the registered app> -AppSecret <client secret sting> -TenantId <your tenant ID> -DelegatedUser <Azure AD account>
For example:
PowerShell
$pscreds = Get-Credential CONTOSO\scanner
Set-Authentication -AppId"77c3c1c3-abf9-404e-8b2b-4652836c8c66" -AppSecret"OAkk+rnuYc/u+]ah2kNxVbtrDGbS47L4" -DelegatedUser scanner@contoso.com -TenantId"9c11c87a-ac8b-46a3-8d5c-f4d0b72ee29a" -OnBehalfOf$pscreds
Acquired application access token on behalf of CONTOSO\scanner.
The scanner now has a token to authenticate to Microsoft Entra ID. This token is valid for one year, two years, or never, according to your configuration of the Web app /API client secret in Microsoft Entra ID. When the token expires, you must repeat this procedure.
Continue using one of the following steps, depending on whether you're using the compliance portal to configure your scanner, or PowerShell only:
If you are configuring and installing your scanner using PowerShell instead of the scanner pages in the compliance portal, continue with step 5 in Use PowerShell to configure the scanner.
To configure the scanner to apply classification and protection in the Microsoft Purview portal or Microsoft Purview compliance portal:
In the Microsoft Purview portal or Microsoft Purview compliance portal, on the Content scan jobs tab, select a specific content scan job to edit it.
Select the content scan job, change the following, and then select Save:
From the Content scan job section: Change the Schedule to Always
From the Enforce sensitivity labeling policy section: Change the radio button to On
Make sure a node for the content scan job is online, then start the content scan job again by selecting Scan now. The Scan now button only appears when a node for the selected content scan job is online.
The scanner is now scheduled to run continuously. When the scanner works its way through all configured files, it automatically starts a new cycle so that any new and changed files are discovered.
Use a DLP policy
Using a data loss prevention policy enables the scanner to detect potential data leaks by matching DLP rules to files stored in file shares and SharePoint Server.
Enable DLP rules in your content scan job to reduce the exposure of any files that match your DLP policies. When your DLP rules are enabled, the scanner may reduce file access to data owners only, or reduce exposure to network-wide groups, such as Everyone, Authenticated Users, or Domain Users.
In the Microsoft Purview portal or Microsoft Purview compliance portal, determine whether you are just testing your DLP policy or whether you want your rules enforced and your file permissions changed according to those rules. For more information, see Create and Deploy data loss prevention policies
Scanning your files, even when just testing the DLP policy, also creates file permission reports. Query these reports to investigate specific file exposures or explore the exposure of a specific user to scanned files.
To use a DLP policy with the scanner in the Microsoft Purview portal or Microsoft Purview compliance portal:
In the Microsoft Purview portal or Microsoft Purview compliance portal, navigate to the Content scan jobs tab and select a specific content scan job. For more information, see Create a content scan job.
Under Enable DLP policy rules, set the radio button to On.
Viktig
Do not set Enable DLP rules to On unless you actually have a DLP policy configured in Microsoft 365.
Turning this feature on without a DLP policy will cause the scanner to generate errors.
(Optional) Set the Set repository owner to On, and define a specific user as the repository owner.
This option enables the scanner to reduce the exposure of any files found in this repository, which match the DLP policy, to the repository owner defined.
DLP policies and make private actions
If you are using a DLP policy with a make private action, and are also planning to use the scanner to automatically label your files, we recommend that you also define the unified labeling client's UseCopyAndPreserveNTFSOwner advanced setting.
This setting ensures that the original owners retain access to their files.
By default the scanner protects Office file types and PDF files only.
Use PowerShell commands to change this behavior as needed, such as to configure the scanner to protect all file types, just as the client does, or to protect additional, specific file types.
For a label policy that applies to the user account downloading labels for the scanner, specify a PowerShell advanced setting named PFileSupportedExtensions.
For a scanner that has access to the internet, this user account is the account that you specify for the DelegatedUser parameter with the Set-Authentication command.
Example 1: PowerShell command for the scanner to protect all file types, where your label policy is named "Scanner":
Example 2: PowerShell command for the scanner to protect .xml files and .tiff files in addition to Office files and PDF files, where your label policy is named "Scanner":
Use the Export and Import buttons to make changes for your scanner across several repositories.
This way, you don't need to make the same changes several times, manually, in the Microsoft Purview portal or Microsoft Purview compliance portal.
For example, if you've a new file type on several SharePoint data repositories, you may want to update the settings for those repositories in bulk.
To make changes in bulk across repositories in the Microsoft Purview portal Microsoft Purview compliance portal:
In the Microsoft Purview portal or Microsoft Purview compliance portal, select a specific content scan job and navigate to the Repositories tab within the pane. Select the Export option.
Manually edit the exported file to make your change.
Use the Import option on the same page to import the updates back across your repositories.
Use the scanner with alternative configurations
The scanner usually looks for conditions specified for your labels in order to classify and protect your content as needed.
In the following scenarios, the scanner is also able to scan your content and manage labels, without any conditions configured:
Apply a default label to all files in a data repository
In this configuration, all unlabeled files in the repository are labeled with the default label specified for the repository or the content scan job. Files are labeled without inspection.
Configure the following settings:
Setting
Description
Label files based on content
Set to Off
Default label
Set to Custom, and then select the label to use
Enforce default label
Select to have the default label applied to all files, even if they're already labeled by turning Relabel files and Enforce default label on
Remove existing labels from all files in a data repository
In this configuration, all existing labels are removed, including protection, if protection was applied with the label. Protection applied independently of a label is retained.
Configure the following settings:
Setting
Description
Label files based on content
Set to Off
Default label
Set to None
Relabel files
Set to On, with the Enforce default label set to On
Identify all custom conditions and known sensitive information types
This configuration enables you to find sensitive information that you might not realize you had, at the expense of scanning rates for the scanner.
Set the Info types to be discovered to All.
To identify conditions and information types for labeling, the scanner uses any custom sensitive information types specified, and the list of built-in sensitive information types that are available to select, as defined in your labeling management center.
Optimize scanner performance
Obs!
If you're looking to improve the responsiveness of the scanner computer rather than the scanner performance, use an advanced client setting to limit the number of threads used by the scanner.
Use the following options and guidance to help you optimize scanner performance:
Option
Description
Have a high speed and reliable network connection between the scanner computer and the scanned data store
For example, place the scanner computer in the same LAN, or preferably, in the same network segment as the scanned data store.
The quality of the network connection affects the scanner performance because, to inspect the files, the scanner transfers the contents of the files to the computer running the scanner Microsoft Purview Information Protection Scanner service.
Reducing or eliminating the network hops required for the data to travel also reduces the load on your network.
Make sure the scanner computer has available processor resources
Inspecting the file contents and encrypting and decrypting files are processor-intensive actions.
Monitor the typical scanning cycles for your specified data stores to identify whether a lack of processor resources is negatively affecting the scanner performance.
Install multiple instances of the scanner
The scanner supports multiple configuration databases on the same SQL server instance when you specify a custom cluster name for the scanner.
Tip: Multiple scanners can also share the same cluster, resulting in quicker scanning times. If you plan to install the scanner on multiple machines with the same database instance, and want your scanners to run in parallel, you must install all your scanners using the same cluster name.
Check your alternative configuration usage
The scanner runs more quickly when you use the alternative configuration to apply a default label to all files because the scanner doesn't inspect the file contents.
The scanner runs more slowly when you use the alternative configuration to identify all custom conditions and known sensitive information types.
Additional factors that affect performance
Additional factors that affect the scanner performance include:
Factor
Description
Load/response times
The current load and response times of the data stores that contain the files to scan will also affect scanner performance.
Scanner mode (Discovery / Enforce)
Discovery mode typically has a higher scanning rate than enforce mode.
Discovery requires a single file read action, whereas enforce mode requires read and write actions.
Policy changes
Your scanner performance may be affected if you've made changes to the autolabeling in the label policy.
Your first scan cycle, when the scanner must inspect every file, will take longer than subsequent scan cycles that by default, inspect only new and changed files.
If you change the conditions or autolabeling settings, all files are scanned again. For more information, see Rescanning files.
Regex constructions
Scanner performance is affected by how your regex expressions for custom conditions are constructed.
To avoid heavy memory consumption and the risk of timeouts (15 minutes per file), review your regex expressions for efficient pattern matching.
For example: - Avoid greedy quantifiers - Use non-capturing groups such as (?:expression) instead of (expression)
Log level
Log level options include Debug, Info, Error and Off for the scanner reports.
- Off results in the best performance - Debug considerably slows down the scanner and should be used only for troubleshooting.
- With the exception of Excel files, Office files are more quickly scanned than PDF files.
- Unprotected files are quicker to scan than protected files.
- Large files obviously take longer to scan than small files.
Use PowerShell to configure the scanner
This section describes the steps required to configure and install the scanner when you don't have access to the scanner pages in the Microsoft Purview portal or Microsoft Purview compliance portal, and must use PowerShell only.
Viktig
Some steps require Powershell whether or not you are able to access the scanner pages in the compliance portal, and are identical. For these steps, see the earlier instructions in this article as indicated.
Start with PowerShell closed. If you've previously installed the information protection client and scanner, make sure that the Microsoft Purview Information Protection Scanner service is stopped.
Open a Windows PowerShell session with the Run as an administrator option.
Run the Install-Scanner command to install your scanner on your SQL server instance, with the Cluster parameter to define your cluster name.
This step is identical whether or not you're able to access the scanner pages in the compliance portal. For more information, see the earlier instructions in this article: Install the scanner
Get an Azure token to use with your scanner, and then reauthenticate.
This step is identical whether or not you're able to access the scanner pages in the compliance portal. For more information, see the earlier instructions in this article: Get a Microsoft Entra token for the scanner.
The only required parameter in the Set-ScannerContentScan cmdlet is Enforce. However, you might want to define other settings for your content scan job at this time. For example:
PowerShell
Set-ScannerContentScan -Schedule Manual -DiscoverInformationTypes PolicyOnly -Enforce Off -DefaultLabelType PolicyDefault -RelabelFiles Off -PreserveFileDetails On -IncludeFileTypes'' -ExcludeFileTypes'.msg,.tmp' -DefaultOwner <account running the scanner>
The syntax above configures the following settings while you continue the configuration:
Keeps the scanner run scheduling to manual
Sets the information types to be discovered based on the sensitivity label policy
Does not enforce a sensitivity label policy
Automatically labels files based on content, using the default label defined for the sensitivity label policy
Does not allow for relabeling files
Preserves file details while scanning and auto-labeling, including date modified, last modified, and modified by values
Sets the scanner to exclude .msg and .tmp files when running
Sets the default owner to the account you want to use when running the scanner
Use the Add-ScannerRepository cmdlet to define the repositories you want to scan in your content scan job. For example, run:
PowerShell
Add-ScannerRepository -OverrideContentScanJob Off -Path'c:\repoToScan'
Use one of the following syntaxes, depending on the type of repository you're adding:
For a network share, use \\Server\Folder.
For a SharePoint library, use http://sharepoint.contoso.com/Shared%20Documents/Folder.
For a local path: C:\Folder
For a UNC path: \\Server\Folder
Obs!
Wildcards are not supported and WebDav locations are not supported.
If you add a SharePoint path for Shared Documents:
Specify Shared Documents in the path when you want to scan all documents and all folders from Shared Documents.
For example: http://sp2013/SharedDocuments
Specify Documents in the path when you want to scan all documents and all folders from a subfolder under Shared Documents.
For example: http://sp2013/Documents/SalesReports
Use PowerShell to configure the scanner to apply classification and protection
Run the Set-ScannerContentScan cmdlet to update your content scan job to set your scheduling to always and enforce your sensitivity policy.
PowerShell
Set-ScannerContentScan -Schedule Always -Enforce On
Tips
You may want to change other settings on this pane, such as whether file attributes are changed and whether the scanner can relabel files. For more information about the settings available, see the full Set-ScannerContentScan documentation.
Run the Start-Scan cmdlet to run your content scan job:
PowerShell
Start-Scan
The scanner is now scheduled to run continuously. When the scanner works its way through all configured files, it automatically starts a new cycle so that any new and changed files are discovered.
Use PowerShell to configure a DLP policy with the scanner
Run the Set-ScannerContentScan cmdlet again with the -EnableDLP parameter set to On, and with a specific repository owner defined.
For example:
PowerShell
Set-ScannerContentScan -EnableDLP On -RepositoryOwner'domain\user'
Supported PowerShell cmdlets
This section lists PowerShell cmdlets supported for the information protection scanner and instructions for configuring and installing the scanner with PowerShell only.
Demonstrere det grunnleggende om datasikkerhet, livssyklusadministrasjon, informasjonssikkerhet og samsvar for å beskytte en Microsoft 365-distribusjon.