Secure AI File Access and Data Protection Design

Question

Secure AI File Access and Data Protection Design

Mountain Pond 1,716

Hello.

What do you think about this approach to solving the problem?

Perhaps it's worth reconsidering?

Below, I've summarized my research on this issue and formulated the general idea.

Thanks in advance.

Objective

Employees must be provided access to an AI service for file processing while ensuring that only explicitly approved files can be processed by AI.

Files are uploaded to SharePoint by a source application and should only be available for processing within an authorized AI service.

Requirements

Files must be marked with a specific label

Only a designated AI service should be allowed

Files with the label must only be processable by this approved AI service

Other AI services may be allowed or blocked, but labeled files must only be accessible to the approved AI service

Installation and access to unauthorized AI services must be prevented

Implementation

1. File Labeling

In Microsoft Purview, sensitivity labels (Label-A) are created and hidden from user management

An auto-labeling policy is configured in Microsoft Purview targeting the SharePoint site where the files are uploaded

Restricting Access to the Approved AI Service

The approved AI service application is registered in Entra ID

SAML SSO integration with Entra ID is configured

A Conditional Access policy is created targeting the AI application

In the Session section, Use Conditional Access App Control with Use Custom Policy is enabled

After the application appears in Defender for Cloud Apps:

A Session Policy is created

Condition: if a file does not have Label-A applied, uploading the file is blocked

Blocking Unauthorized AI Services

In Microsoft Purview, an Inline Web Traffic DLP policy is created

A rule is configured to block uploading files labeled with Label-A to unmanaged AI applications

As a result, files marked with Label-A should not be transferred to AI services that are not integrated with Entra ID

The Edge add-on and corresponding Intune settings must be deployed to endpoint devices

Controlling Other AI Services

The following solutions can be used to control additional AI services:

Defender for Cloud Apps

Global Secure Access

However:

Defender for Cloud Apps only works in browser-based scenarios

Global Secure Access does not work for RDP servers/sessions

Therefore, an additional control is required:

restricting software installation at the RDP server level

Issues

The solution described in section 3 does not work reliably: files can still be uploaded to virtually any AI service

When using policies for Enterprise applications and devices with the “for Device” option, it is not possible to select Generative AI Websites from the Sensitive service domain groups

Additionally, this policy type is not capable of blocking file uploads

Limitations of the Current Approach

If Copilot is used as the primary AI service:

when a user adds a file directly from SharePoint or OneDrive, Defender for Cloud Apps policies do not apply

in this scenario the file is not uploaded in the traditional sense, but transferred through another mechanism that is not inspected or controlled by the policy

0 comments

1 answer

Your answer

Answer 1

The overall direction is aligned with Microsoft’s guidance: use Microsoft Purview sensitivity labels, Purview DLP (including Network Data Security / inline web traffic), Entra ID, and Defender for Cloud Apps to enforce strict data boundaries and prevent oversharing to unmanaged AI. However, the current design has gaps and limitations that need to be addressed or acknowledged.

Key points from the documented capabilities and how they map to the design:

Use sensitivity labels as the primary control for AI interactions

Sensitivity labels are the right foundation for controlling which files can be processed by AI.
When labels apply encryption, AI apps must honor the existing protection and require both VIEW and EXTRACT usage rights before returning data to users or AI apps.
This protection applies not only to data in SharePoint/OneDrive but also to data in use in Office apps (local storage, network shares, cloud storage) when opened in Word/Excel/PowerPoint/Outlook.
Best practice is to always use sensitivity labels (rather than ad-hoc encryption) so that AI apps and Purview can consistently enforce protection.

Implications for the design:

Label-A should be configured with encryption and appropriate usage rights so that only the approved AI service (and its users) have EXTRACT rights. Other users/apps should have at most VIEW or no rights.
This gives a strong, data-centric control that travels with the file, independent of network path or client.

Define and enforce data boundaries for AI workloads

AI workloads should operate within clearly defined data boundaries, with separate datasets and RBAC for different applications and user groups.
Microsoft Purview is recommended to classify data sensitivity and define access policies; Azure RBAC and Private Link are used to restrict access by workload and user group.

Implications for the design:

The approved AI service should be placed in its own data boundary (e.g., dedicated storage accounts, data lakes, or SharePoint sites) and governed by Purview.
Access to Label-A content should be restricted via RBAC and label-based encryption so that only this AI workload (and its service identities) can access and extract content.

Use Purview DLP and Network Data Security for unmanaged AI

Microsoft Purview Data Loss Prevention and Network Data Security are the recommended tools to prevent sharing sensitive information with unmanaged AI apps.
A typical pattern is:
- Define a DLP policy that targets “All unmanaged AI apps” via adaptive app scopes.
- Add conditions on sensitive info types and/or sensitivity labels.
- Configure actions to block “Text sent to or shared with cloud or AI apps” and “File uploaded to or shared with cloud or AI apps”.
- Turn on incident alerts and integrate with security operations.
Network Data Security requires integration with a SASE or non-Microsoft secure browser provider and deployment of the endpoint components.

Implications for the design:

The idea in section 3 (Inline Web Traffic DLP + Label-A) is aligned with documented patterns: block Label-A content from being uploaded to unmanaged AI apps.
Reliability issues likely stem from coverage gaps (e.g., missing SASE/secure browser integration, endpoints not onboarded, or traffic paths not going through the protected channel). Ensuring full integration and coverage is critical.
For broad endpoint coverage (multiple browsers, local AI apps, Office add-ins), Network Data Security plus endpoint DLP is the recommended combination.

Use Purview DLP for Microsoft 365 Copilot and AI apps

For Microsoft 365 Copilot and other supported AI apps, Purview provides specific controls:
- Copilot and AI apps honor existing permissions and label-based encryption; data is never returned if the user lacks access.
- When labels apply encryption, the EXTRACT right is required for AI apps to return data.
- Purview DLP for Microsoft 365 Copilot can be used to prevent Copilot from summarizing sensitive data.

Implications for the design:

If Copilot is the primary AI service, the main control for “only explicitly approved files” is label-based encryption plus DLP for Copilot, not network upload inspection.
The limitation observed (“when a user adds a file directly from SharePoint or OneDrive, Defender for Cloud Apps policies do not apply”) is expected: this path is not a traditional upload and is governed by Copilot’s own data access and Purview DLP, not by MCAS session controls.
To enforce “only Label-A files may be processed by Copilot” or “Label-A files may only be processed by Copilot”, use:
- Sensitivity labels with encryption and EXTRACT rights scoped to Copilot-eligible users/services.
- Purview DLP for Microsoft 365 Copilot to block summarization of more sensitive labels or to restrict which labels Copilot can use.

Addressing the gaps and reconsiderations

Based on the documented capabilities, the following adjustments strengthen the design:

Make sensitivity labels with encryption the primary enforcement mechanism for which AI apps can process which files. Ensure Label-A is configured so that only the approved AI service (and its identities) have EXTRACT rights.
Use Purview DLP for Microsoft 365 Copilot (and other supported AI apps) to prevent AI from summarizing or returning content for certain labels or sensitive info types.
Use Network Data Security and endpoint DLP to block Label-A content from being sent to unmanaged AI apps over the network, including text prompts and file uploads.
Ensure SASE / secure browser integration is fully configured and that all relevant endpoints and traffic paths are covered; otherwise, uploads to unmanaged AI will bypass inline inspection.
For RDP and server scenarios where Global Secure Access and browser-based controls are limited, complement with endpoint DLP and OS-level controls (e.g., restricting software installation and network paths) as part of the broader security posture.

This combination aligns with Microsoft’s guidance: define data boundaries, classify and label data with Purview, enforce RBAC and label-based encryption, and use Purview DLP (including Network Data Security and Copilot-specific DLP) to prevent sensitive data from being exposed to unauthorized AI applications.

References:

Share via

Secure AI File Access and Data Protection Design

Objective

Requirements

Implementation

Issues

Limitations of the Current Approach

1 answer

Your answer