Know your data

7 minutes

The first challenge many organizations face is understanding what kind of data they have and where it resides. Before you can protect sensitive data, you need full visibility into your environment. This process begins with identifying, classifying, and managing sensitive data. Here’s a general framework to help guide the process:

Describe the categories of sensitive information you want to protect: Start by identifying the types of sensitive data in your organization, such as financial data, customer records, or intellectual property. Then, determine the level of protection each type requires.
Discover and classify sensitive data: Using tools like sensitive information types and trainable classifiers, you can automatically discover and label sensitive data to ensure its security.
View and manage your sensitive items: Once classified, sensitive data can be monitored and managed throughout its lifecycle using policies and centralized tools.

Diagram illustrating the steps needed to know your data for Microsoft Purview Information Protection.

As you move through these steps, consider these questions to help refine your strategy:

Who owns my data?
What types of data do I have?
Where is my data?
Why is it a risk?
What methods can I use to classify my data?
Where can I classify my data?
How can I see what happens to my data over its lifecycle?

Now that you understand the basic steps, let's explore how Microsoft Purview enables data classification through its tools and policies. These concepts help you discover, protect, and manage sensitive data across your environment.

Data classification concepts

Classification involves identifying and labeling content in your organization to better understand your data landscape.

Sensitive information types

Sensitive information types allow you to detect and label common types of sensitive data automatically. This helps ensure sensitive data is automatically identified and protected, reducing the risk of data breaches. Microsoft Purview provides over 300 built-in patterns, such as credit card numbers, Social Security Numbers, and other regulated data. These patterns are identified through regular expressions or functions. For organizations with unique needs, you can create custom sensitive information types to capture proprietary or specialized data.

Trainable classifiers

Trainable classifiers use artificial intelligence and machine learning to identify content specific to your organization, such as contracts or customer records, without relying solely on pattern matching. These classifiers reduce the risk of missing sensitive content that doesn't fit standard patterns, offering more comprehensive protection.

Exact data match (EDM)

Exact data match (EDM) provides a highly accurate way to identify sensitive information by matching specific, predefined data values from a secure data source. This is especially useful when your organization deals with highly sensitive and structured data, like employee IDs or customer account numbers. EDM helps prevent false positives by precisely targeting specific data, rather than relying on general patterns or keywords.

Data exploration and monitoring tools

Once data is classified, it's important to monitor and explore how it's stored, accessed, and used across your organization.

Data and content explorer

The data and content explorers provide a comprehensive view of where sensitive data is stored in your organization. This data is categorized by types, such as credit card numbers or health information. This tool enables you to understand where sensitive data resides and adjust your protection strategies accordingly, helping you map your risk profile.

Activity explorer

The activity explorer provides insights into how sensitive data is being accessed and used. It tracks activities such as who accessed specific files, what actions were performed on those files, and where potential risks might lie. This level of visibility into data activity ensures that classified data is being handled appropriately and helps you identify and mitigate potential risks.