Edit

Share via


Client-based auto-labeling recommendations for the Australian Government

This article provides guidance for Australian Government organizations on client-based sensitivity auto-labeling capabilities. Its purpose is to demonstrate how auto-labeling can help to improve data security posture. Guidance is intended to adhering with the Protective Security Policy Framework (PSPF) and Information Security Manual (ISM).

Auto-labeling overview details where auto-labeling is suitable in a modern Government work environment and reduces security risks.

In the Australian Government context, client-based auto-labeling is useful for recommending labels based on:

  • Sensitive content detection
  • Markings applied by external organizations
  • Markings applied by non-Microsoft tools
  • Historical markings
  • Paragraph markings

Client-based auto-labeling is configured directly within a sensitivity label's configuration. This method of auto-labeling applies to Office or online clients and interactively identifies sensitive content, notifies the user, and then either:

  • Automatically applies the sensitivity label relevant to the most sensitive content detected in an item; or
  • Recommends to the user that they apply the label.

PSPF 2024 Requirement 59 and ISM 0271 make it clear that a user should be responsible for applying classifications to items rather than an automated service. Because of this, client-based auto-labeling should be configured to provide user recommendations only:

Requirement Detail
PSPF 2024 - 09. Classification & Caveats - Requirement 59 The value, importance, or sensitivity of official information (intended for use as an official record) is assessed by the originator by considering the potential damage to the government, the national interest, organizations, or individuals that would arise if the information’s confidentiality were compromised.
ISM Security Control: 0271 (March 2025) Protective marking tools don't automatically insert protective markings into emails.

In the following example, the user commenced writing about Project Budgerigar. The client-based auto-labeling action triggered the detection of Sensitive Information Type (SIT). The result was a client-based auto-labeling recommendation appearing at the top of the email:

Example of a client-based auto-labeling recommendation.

Client-based auto-labeling actions can be triggered based of detection of Sensitive Information Types (SITs), including Exact Data Match SITs, and trainable classifiers. A combination of SITs and classifiers can also be used.

For information around SITs that are of most relevance to Australian Government, see Identifying sensitive and security classified information for Australian Government, which includes information on creating SITs to detect security classifications.

Client-based auto-labeling scenarios for Australian Government

Client-based auto-labeling helps to protect sensitive information by identifying items, which are under-classified. Under-classification information represents significant risk to Australian Government. Client-based auto-labeling helps ensure the correct label application, and items are marked and protected appropriately. The correct label ensures that only appropriate distribution of information is allowed as controls, Data Loss Prevention (DLP), and similar controls apply based on an item's label.

Accurate classification helps ensure that need-to-know principles are maintained and that access to information is restricted. These concepts relate to PSPF 2024 Requirement 131:

Requirement Detail
PSPF 2024 - 17. Access to Resources - Requirement 131 Access to security classified information or resources is only given to entity personnel with a need-to-know that information.

Recommending labels based on sensitive content detection

Detecting sensitive content and recommending that users apply an appropriate label helps to protect information and ensure need-to-know. The label recommendation options available to client-based auto-labeling also allow us to ensure that a user is responsible for classification decisions rather than an automated system.

Client-based auto-labeling is used to increase item sensitivity where appropriate. For Australian Government, the benefits are seen particularly at the high end of the sensitivity taxonomy. Recommending an increase in classification from UNOFFICIAL to OFFICIAL has little relevance to data security. However, detecting an underclassified PROTECTED item could prevent data breach.

Organizations should compile a list of SITs and classifiers and align them with appropriate sensitivity labels. For example:

Label SIT Use
OFFICIAL Sensitive Personal Privacy Australian Health Records Act Enhanced. This prebuilt SIT seeks to identify occurrences of:
- Australian Tax File Number (TFN)
- My Health Record
- All Full Names
- All Medical Terms and Conditions
- Australia Physical Addresses
Health information relating to an individual is protected under the privacy act and could be appropriate for labeling as 'OFFICIAL Sensitive Personal Privacy.'
OFFICIAL Sensitive Legislative Secrecy A 'Legislative Secrecy Keywords' custom SIT containing key words, such as:
- 'Legislative Secrecy Warning:'
As recommended in PSPF Release 2024 Guidelines, a text-based warning notice should be placed at the top and bottom of items relating to legislative information. Organizations should be applying these notices via document templates or a similar means. These warning notices could be used to identify items, which should be marked with the 'OFFICIAL Sensitive Legislative Secrecy' label.
PROTECTED A Codeword or list of codewords associated with initiatives that should have their information classified as PROTECTED. For example:
- 'Project Budgerigar'

A list of keywords relating to subjects, which can be considered highly sensitive and for which loss of information could result in damage or loss of confidence in Government. For example:
- 'data breach'
- 'highly sensitive'
- 'against the law'
- 'code of practice'
- 'breach of trust'
A list of keywords could be used to detect items that contain information relating to a classified project, initiative, system, or application.
Adding a list of keywords considered sensitive to an organization to a SIT allows Microsoft 365 to prompt users to increase the sensitivity of an item when the keywords are detected.
Doing so helps the user to consider need-to-know and allow for protections to be applied to items to prevent inappropriate distribution of information (for example, DLP, encryption, and other controls).

The strategies outlined in the previous table can also be used to locate and act on sensitive information via other Microsoft 365 capabilities such as:

Recommendations based on external agency markings

Many of the controls discussed in this document are enacted based on the labels applied to items. Much of the information generated externally has text-based protective markings in place but by default won't have sensitivity labels relevant to your organization applied. The result of this is that such items aren't protected from label-based DLP policies. Alerting also won't trigger when these items are saved to lower sensitivity locations.

Situations where this scenario could occur include:

  • When items have been created by other government organizations that adhere to PSPF. In these situations, protective markings are in place but sensitivity label metadata, such as label Globally Unique Identifiers (GUIDs), won't align with your own sensitivity label configuration. Items are marked with security classifications but not labeled with a sensitivity label.
  • When items that have been created by other government organizations that don't, or only partially align with the PSPF framework (for example, New South Wales (NSW) government organizations).
  • When items that have been created and classified by foreign governments are shared with Australian Government organizations.

To protect information that your organization has received and is a custodian of, client-based auto-labeling can be used to recommend that equivalent labels are applied to items so that they're protected.

Such configurations make use of SITs to identify markings or classifications applied externally. These SITs are then need to be added to the auto-labeling configuration of the relevant sensitivity labels.

Some examples of where SITs can be used to recommend labels based on markings applied externally include:

Label SIT Use
OFFICIAL Sensitive OFFICIAL Sensitive Regex SIT To identify items marked as OFFICIAL: Sensitive but without the OFFICIAL Sensitive label applied to them, including items generated by other organizations.
PROTECTED PROTECTED Regex SIT To identify items marked as PROTECTED but without the PROTECTED label applied.
OFFICIAL Sensitive OFFICIAL Sensitive – NSW Government Information marked with OFFICIAL Sensitive – NSW Government and received by a Federal Government organization aren't labeled by default and therefore doesn't have protections configured that align with the OFFICIAL Sensitive security classification. Marking these items as OFFICIAL Sensitive when modified by your users helps to protect the contained information. Visual markings applied by NSW Government agencies would still be present on the item, making it clear that the item was generated elsewhere1.
OFFICIAL Sensitive - Legal Privilege OFFICIAL Sensitive – Legal (NSW Gov)

OFFICIAL Sensitive – Law enforcement (NSW Gov)
This configuration would ensure that information marked with either of the NSW State Government legal-related markings is treated in line with OFFICIAL: Sensitive Legal Privilege while it resides within a Federal Government environment.
SECRET CONFIDENTIEL UE CONFIDENTIEL UE is a classification used by members of the European Union. Examples of mappings for foreign government classifications were previously provided in PSPF Policy 7, but have been incorporated into Requirement 82. Previous guidance was to align CONFIDENTIEL UE with a SECRET security classification.

Detecting CONFIDENTIAL UE markings and applying a SECRET label helps to ensure that such information can be identified and potentially removed in line with labels for Information that shouldn't be placed on Microsoft 365

Note

1 An alternative approach might be to include an OFFICIAL Sensitive – NSW Government label within your organizations label taxonomy. This label could be published to an administrative account only. Doing so ensures that it is within scope of auto-labeling service but not available for users to apply to items. This idea is further discussed in labels for organizations with differing label taxonomies.

Requirement Detail
PSPF 2024 - 12. Information Sharing - Requirement 82 Where an international agreement or international arrangement is in place, security classified foreign entity information or resources are safeguarded in accordance with the provisions set out in the agreement or arrangement.

Recommendations based on markings applied by non-Microsoft tools

Many Government organizations currently, or have previously, made use of non-Microsoft tools to apply markings to files and email. These tools are configured to apply one or more of:

  • X-Protective-Marking x-headers to email
  • Text-based headers and footers to email and documents
  • Subject-based email markings
  • File metadata via document properties

For organizations transitioning from non-Microsoft tools to native Microsoft Purview capabilities, these existing properties or markings can be used to determine which sensitivity label should be applied.

Important

Client based auto-labeling complements service based auto-labeling, and both should be used together. For example, consider situations where service-based auto-labeling hasn't yet identified and labeled a sensitive item at rest. In such situations, client based auto-labeling can detect and recommend a label when the item is opened by a user.

Service-based auto-labeling can't detect content or label email residing within user mailboxes. To help ensure that legacy email is protected, client-based auto-labeling is used to ensure that markings applied to preexisting items are converted to labels when email is forwarded or replied to. For example, consider a preexisting PROTECTED email with a text-based PROTECTED marking applied to it but no sensitivity label. When a user attempts to forward it or reply to it, client-based auto-labeling can identify the item PROTECTED based on the existing markings and then recommend that the user applies the PROTECTED label to the item.

The following client-based auto-labeling example configurations ensure items containing an existing marking have the correct sensitivity label applied. These configurations also identify markings applied previously by a legacy non-Microsoft classification tools and markings on items generated by external PSPF compliant organizations:

Label SIT requirement Regular Expression
OFFICIAL Sensitive SIT that detects the following marking syntax:
- OFFICIAL Sensitive
- OFFICIAL: Sensitive
- OFFICIAL:Sensitive
\bOFFICIAL( \ | :\ | : )Sensitive(?!(\s\ | \/\/\ | \/\/ \ | , )(\bNATIONAL( \ | -)CABINET\b\ | [P,p]ersonal( \ | -)[P,p]rivacy\ | \b[L,l]egal( \ | -)[P,p]rivilege\b\ | [L,l]egislative( \ | -)[S,s]ecrecy))
PROTECTED SIT that detects the following marking syntax:
- PROTECTED
\bPROTECTED(?!(\s\ | \/\/\ | \/\/ \ | , )(CABINET\b\ | \bNATIONAL( \ | -)CABINET\b\ | [P,p]ersonal( \ | -)[P,p]rivacy\ | \b[L,l]egal( \ | -)[P,p]rivilege\b\ | [L,l]egislative( \ | -)[S,s]ecrecy))

Note

These Registry Expressions are intended to identify items based on security classification but exclude markings with additional Information Management Markers (IMMs) or Caveats applied. More SITs are required to identify items including these extra markings. For a complete list of SIT syntax for Australian Government, see the Example SIT syntax to detect protective markings.

Recommendations based on historical markings

Government marking requirements do change periodically, as occurred in October of 2018 when markings (for example, CONFIDENTIAL and For Official Use Only (FOUO)) were removed from policy. Government organizations are likely to have a significant amount of information residing on their systems with these historical markings applied.

Handling these historical markings is typically outside of the scope of any new Microsoft Purview deployments. However, if your organization wishes to bring historical marking into scope, historical markings could be split into two categories; historical markings that have a modern equivalent and markings that don't. PSPF 2024 Guidelines provides a list of historical classifications and markings along with their current handling requirements.

An easy option for historical markings that align with a modern equivalent is to configure auto-labeling to recommend application of the equivalent label when these items are opened. With this configuration, the user experience is:

  • When the user opens and attempts to reply or forward a legacy email, the historical marking is detected. A label recommendation is provided to the user for the new email.
  • When legacy file is opened, modified, and saved by a user, their Office client detects the previous marking and prompts the user to apply a modern equivalent to the item before saving.

The previous actions help to ensure that appropriate controls are applied to historical items.

Tip

Australian Government Records management requirements are relevant when dealing with historical markings. If an item has been declared as a record by a Microsoft 365 retention label with record configuration enabled, then it's locked preventing any further edits. The result is that the applied sensitivity label can't be changed as this requires a change to the item, which can affect its retention period. However, if an item with a historical marking is saved as a new item (for example, used as a template), then recommending a label based on the historical marking can be useful.

The following are examples of how SITs based on historical markings could be configured and used with client-based auto-labeling to suggest a new label based on a historical marking:

Label SIT Use
OFFICIAL Sensitive For Official Use Only SIT containing the following keywords:
- For Official Use Only
- For-Official-Use-Only
- FOUO
X-IN-CONFIDENCE SIT containing the following keywords:
- X-IN-CONFIDENCE
Client-based auto-labeling could be used to identify legacy content with these historical markings applied and suggest a modern alternative on new or edited items based on the legacy items.

SITs and DLP policies should be configured to check for historical markings and ensure that relevant controls are applied. These configurations ensure items with historical markings sent externally via email have modern labels and associated controls applied.

Recommending labels based on paragraph markings

Some Government organizations make use of paragraph markings in documents. Recommendations for paragraphs are created with set of SITs to help identify the sensitivity that can be applied to an item based on its contained paragraph markings. However, the document label aggregates to the highest marking.

To achieve this, we could use:

  • An OFFICIAL keyword SIT detecting the (O) paragraph marking and recommending that the OFFICIAL label is applied when detected.
  • An OFFICIAL Sensitive keyword SIT detecting the (O:S) paragraph marking and recommending that the OFFICIAL label is applied when detected.
  • A PROTECTED keyword SIT detecting the (P) paragraph marking and recommending that the PROTECTED label is applied when detected.
  • A SECRET keyword SIT detecting the (S) paragraph marking and recommending that the SECRET label is applied when detected.

The SECRET marking SIT can be useful for identifying information that shouldn't be stored within the platform. Checking for items containing such markings can identify spilled data and allow you to prevent further data breach. For more information, see labels for Information that shouldn't be placed on Microsoft 365.

Note

Straightforward keyword SITs like paragraph markings have potential to generate false positives. For example, if (P), was to appear in a document or email without being intended as a paragraph marking, the service can then recommend that the user marks the item as PROTECTED. For this reason, SITs to identify paragraph markings should be carefully considered before implementation to determine if false positive matches are likely occur.

Example client-based auto-labeling configuration

These examples are based on the use of SITs and classifiers to identify protective markings or sensitive information. Once identified, an appropriate label is then recommended to the user. These examples are boilerplate Australian Government examples and organizations should work to develop their own SITs to identify organization specific information.

Label Suggested SITs
UNOFFICIAL UNOFFICIAL Regex SIT intended to detect an UNOFFICIAL marking.
UNOFFICIAL Paragraph Marking SIT intended to detect (UO).
OFFICIAL OFFICIAL Regex SIT intended to detect an OFFICIAL marking
OFFICIAL Paragraph Marking SIT intended to detect (O).
OFFICIAL Sensitive (Category) N/A
OFFICIAL Sensitive OFFICIAL: Sensitive Regex SIT intended to detect variations of OFFICIAL Sensitive markings without inclusion of Information Management Markers (IMMs) or Caveats.
SITs relating to information about processes or systems where disclosure of information could result in damage.

Prebuilt SITs of:
- All Credential Types
- Credit Card Number

'OFFICIAL Sensitive Paragraph Marking' SIT intended to detect (O:S)
OFFICIAL Sensitive Personal Privacy OFFICIAL: Sensitive Personal Privacy Regex intended to detect the marking.

Prebuilt SITs of:
- Australia Bank Account Number
- Australia Driver's License
- Australia Medical Account Number
- Australia Passport Number
- Australia Tax File Number
OFFICIAL Sensitive Legal Privilege OFFICIAL: Sensitive Legal Privilege Regex SIT intended to detext the marking.

Prebuilt trainable classifier of:
- Legal Affairs
OFFICIAL Sensitive Legislative Secrecy OFFICIAL: Sensitive Legislative Secrecy Regex SIT intended to detect the marking.
OFFICIAL Sensitive NATIONAL CABINET OFFICIAL: Sensitive NATIONAL CABINET Regex SIT intended to detect the marking.
PROTECTED (Category) N/A
PROTECTED Protected Regex SIT intended to detect the marking.

PROTECTED Paragraph Marking SIT intended to detect (P).

Other keyword SITs relating to processes or systems where disclosure of information could result in damage.
PROTECTED Personal Privacy PROTECTED Personal Privacy Regex SIT intended to detect the marking.
PROTECTED Legal Privilege PROTECTED Legal Privilege Regex SIT intended to detect the marking.
PROTECTED Legislative Secrecy PROTECTED Legislative Secrecy Regex SIT intended to detect the marking.
PROTECTED NATIONAL CABINET PROTECTED NATIONAL CABINET Regex SIT intended to detect the marking.
PROTECTED CABINET PROTECTED CABINET Regex SIT intended to detect the marking.

Note

For a list of Registry Expressions (RegEx) to use in SITs for identifying security classifications, see Identifying sensitive and security classified information for Australian Government or ASD's examples in their Blueprint for Secure Cloud.