Microsoft Information Protection SDK - Metadata
The Microsoft Information Protection SDK generates the set of metadata that should be applied to a file. This metadata is a representation of the label. This document describes the metadata the SDK generates to apply to mail, documents, and other records.
Labels in the Microsoft Information Protection SDK are applied to information to describe the sensitivity of that information. Label data is persisted to file or record in a set of key-value pairs that describe the label. The metadata name is built on the following structure:
DefinedPrefix_ElementType_GlobalIdentifier_AttributeName
When applied to data labeled with Microsoft Purview Information Protection, the result is:
MSIP_Label_GUID_Enabled = true
The GUID is a unique identifier for each label in an organization. An object can only have one label from the same organization. Multiple labels from the same organization can't be written to a file or object.
The MIP SDK applies the following set of metadata.
Attribute | Type or Value | Description | Mandatory |
---|---|---|---|
Enabled | True or False | This attribute indicates whether the classification represented by this set of key-value pairs is enabled for the data item. DLP products typically validate the existence of this key to identify the classification label. | Yes |
SiteId | GUID | Microsoft Entra tenant ID | Yes |
ActionId (Removed in MIP SDK 1.8 and later) | GUID | ActionID is changed each time a label is set. Audit logs include both old and new actionID to allow chaining of labeling activity to the data item. | Yes |
Method | Standard or Privileged | Set via mip::AssignmentMethod. Standard implies that the label is applied by default or automatically. Privileged implies that the label was manually selected. | No |
SetDate | Extended ISO 8601 Date Format | The timestamp when the label was set. | No |
Name | string | Label unique name within the tenant. It doesn't necessarily correspond to display name. | No |
ContentBits | integer | ContentBits is a bitmask that describes the types of content marking that was applied to content based on policy and client capabilities. CONTENT_HEADER = 0X1, CONTENT_FOOTER = 0X2, WATERMARK = 0X4, ENCRYPT = 0x8 | No |
When applied to a file, the result is similar to the table below.
Key | Value |
---|---|
MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_Enabled | true |
MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_SetDate | 2018-11-08T21:13:16-0800 |
MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_Method | Privileged |
MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_Name | Confidential |
MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_SiteId | cb46c030-1825-4e81-a295-151c039dbf02 |
MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_ContentBits | 2 |
MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_ActionId | 88124cf5-1340-457d-90e1-0000a9427c99 |
The contentBits
metadata property in MIP SDK is used to indicate which content-modifying actions were taken on a file by the labeling application.
contentBits
can be read by downstream applications to understand which actions were applied to the file. Using this property, an application like Word, Excel, or PowerPoint can compare the value on the file metadata to the value it computes based on the policy. If there's a difference, it may apply the missing information.
Take, for example, the MIP File SDK. It doesn't support applying header, footer, or watermark directly to a file. When a file is labeled with MIP SDK, the only outcomes for contentBits are 0x0
if the file is unprotected or 0x8
if the file is protected. This is true even if the label policy has header, footer, and watermark configured.
When the file is opened in Microsoft Word, the header, footer, and watermark will be missing. When the user clicks save the policy is reevaluated and contentBits on the file is compared to the policy result. If there's a difference, the application applies the missing content marking.
Note
This feature requires the Azure Information Protection client today, and works only in Word, Excel, or PowerPoint. It does not apply to Outlook.
Custom metadata may be appended via File and Policy SDK. Custom attributes must maintain the base MSIP_Label_GUID
prefix.
For example, an application written by Contoso Corporation must apply metadata indicating which system generated a labeled file. The application can create a new label, prefixed with MSIP_Label_GUID
. The software vendor name and custom attribute are appended to the prefix to generate the custom metadata.
LabelingOptions options = new()
{
AssignmentMethod = AssignmentMethod.Standard
};
options.ExtendedProperties = new List<KeyValuePair<string, string>>();
options.ExtendedProperties.Add(new KeyValuePair<string, string>("GeneratedBy", "HRReportingSystem"));
MSIP_Label_f048e7b8-f3aa-4857-bf32-a317f4bc3f29_GeneratedBy = HRReportingSystem
Note
To maintain compatibility across common applications, the maximum length for each a key and a value is 255 characters.
Note
When the protected co-authoring experience is enabled, custom metadata properties will be written to the custom.xml component of supported Office file types.
Over time, attributes will be introduced, modified, or retired. It's expected that applications continue to handle these old or retired attributes, as replacing the value across an enterprise may take years.
When replacing an attribute with a newer version, a version suffix should be added to the attribute:
MSIP_Label_GUID_EnabledV2 = True | False | Condition
When co-authoring for files encrypted with sensitivity labels is enabled, the label metadata storage location and format for Word, Excel, and PowerPoint files change.
When the co-authoring feature is enabled, the SDK automatically reads and writes information to and from the updated metadata storage location. If label metadata doesn't exist in the new location, the SDK will attempt to read the metadata from the legacy storage location. If found, the label is available via get label operations. The next time a set label operation is performed, the metadata is written to the updated LabelInfo locations.
The purpose of the properties as previously described doesn't change.
The following references describe the format and storage locations:
Metadata applied to email maintains a key/value pair format similar to that of documents. The primary difference is that all attributes are serialized in to a single email header called MSIP_Labels. The key/value pairs are delimited by a semicolon and a whitespace, and placed in the new header.
Using the sample metadata above:
MSIP_Labels: MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_Enabled=true; MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_SetDate=2018-11-08T21:13:16-0800; MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_Method=Privileged; MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_Name=Confidential; MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_SiteId=cb46c030-1825-4e81-a295-151c039dbf02; MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_ContentBits=2; MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_ActionId=88124cf5-1340-457d-90e1-0000a9427c99