Data health and validation in School Data Sync

Completed

As discussed in the previous unit, after you connect your SIS data, you can enable Managed Data scenarios that use Microsoft 365 provisioning to support:

  • Users
  • Classes
  • Class teams
  • SharePoint sites
  • OneNote Class Notebooks
  • Intune management
  • Third-party app integrations

During each sync run, School Data Sync (SDS) validates the data to ensure that only good, required, and optional data enters the SDS cache.

Screenshot of the School Data Sync dashboard showing the system status, including sync health and institution, user, and group stats.

What happens during data validation

During each sync run, the inbound flow prepares SIS data for import:

  • Standardized schema: SDS maps roster data to a temporary import schema it uses across all supported input formats.
  • Microsoft Entra ID cache: SDS pulls a copy of the tenant's Microsoft Entra ID users and groups into its cache. This cache is used to run identity matching rules to identify existing users.
    • At this stage, matches are stored only in the SDS cache.
    • SDS doesn't write any values back to Microsoft Entra ID until a Managed Data user provisioning scenario is enabled.
  • Advanced validation: SDS applies multiple categories of validation rules, including:
    • Type and code validation (for example, grade levels, subjects, and other List of Values)
    • Data matching rules
  • Required and cross-reference validation (to ensure data links across files or endpoints are correct)

Note

Only data that passes validation is written to the SDS cache.

Errors and warnings

Validation results include:

  • Errors: Required data is missing or invalid. The entire record is excluded from processing.
  • Warnings: Optional data is invalid and removed, but the rest of the record continues to process.

To support troubleshooting, SDS generates a Customer Errors and Warnings file that includes:

  • One row per affected record
  • The validation rule that failed
  • Whether the issue is an error or warning

This report helps you understand the health of your incoming SIS data.

How SDS stores data in the cache

When validated data is written to the SDS cache, SDS tracks the lifecycle of each record to help identify data changes over time.

For each record, SDS records:

  • FirstSeenDateTime: When SDS first encountered the record
  • LastSeenDateTime: The latest sync run in which the record was present
  • IsActiveInSession: Whether the record is currently active based on the latest sync

SDS updates these values to help you accurately track changes, troubleshoot data issues, and monitor overall data health.

Scenario Handling
New record detected - Sets FirstSeenDateTime and LastSeenDateTime to the current time
- Marks the record as active
Record still present in subsequent runs - Keeps FirstSeenDateTime unchanged
- Updates LastSeenDateTime
- Leaves IsActiveInSession = true
Record missing in a subsequent run - Keeps FirstSeenDateTime and LastSeenDateTime
- Sets IsActiveInSession = false
- Indicates only that SDS didn't see the record in the current run (not necessarily that it was deleted in the SIS)

General data-handling rules for missing data in subsequent sync runs

When a record is missing in a later sync run for the same source and academic session, SDS applies the following handling rules:

Missing record Handling rule
Organizations No change, the record persists
Users IsActiveInSession is set to false on the:
- Person
- OrganizationRole
- EnrollmentRole associations
Organization roles IsActiveInSession is set to false
Academic sessions No change, the record persists
Classes (sections) IsActiveInSession is set to false on:
- Section
- SectionSession
- Enrollment
Enrollment IsActiveInSession is set to false on Enrollment record
Courses IsActiveInSession is set to false on Course record
Demographics No change, the record persists
User flags No change, the record persists
Relationships (parents/guardians) No change, the record persists

Rolling updates for inactivated records

When a record like a user or enrollment isn't present in a later sync:

Missing record Handling rule
User The record persists. (not deactivated)
FirstSeenDateTime and LastSeenDateTime are preserved.
IsActiveInSession is set to false for:
- OrganizationRole
- EnrollmentRole
- Other association records
Organization The record persists (not deactivated).
FirstSeenDateTime and LastSeenDateTime are preserved.
IsActiveInSession is set to false for:
- OrganizationRole
- EnrollmentRole
- Other association records
Session The record persists (not deactivated).
FirstSeenDateTime and LastSeenDateTime are preserved.
IsActiveInSession is set to false for:
- OrganizationRole
- EnrollmentRole
- Other association records
Enrollment FirstSeenDateTime and LastSeenDateTime are preserved.
IsActiveInSession is set to false for:
- OrganizationRole
- EnrollmentRole
- Other association records

If a previously missing record (like a user section enrollment) re-appears within the same academic session, SDS updates the existing record rather than creating a new one.

Sync health overview

Status Action
If no errors or warnings are found - The run is marked Completed.
- The SDS homepage displays: "No data errors or warnings found. We did not encounter any data errors or warnings during your last run."
If errors or warnings are found - SDS flags the run as Completed with errors or Completed with warnings.
- The homepage displays a notification encouraging admins to review the data: "We found some issues with your data. We recommend reviewing your sync health."
- Select Investigate Sync Health to review details.

Errors and warnings help assess the impact of data issues.

  • Errors:
    • Required data failed validation
    • The record wasn't sent to the SDS cache
  • Warnings:
    • Optional data failed validation
    • Invalid values were removed, but the record was still included in processing

A downloadable log file is available for deeper investigation.

Understand errors and warnings with Sync Health

Sync Health helps you understand:

  • What data changed during the latest sync
  • Historical trends across the last 14 runs
  • Where issues occurred
  • What actions might be needed in the SIS

Key run-status indicators

Status Description
Running Sync is in progress
Completed No errors or warnings
Completed with errors Errors occurred
Completed with warnings Only warnings occurred
Failed The run was canceled by the system or customer

Admin actions

Action Description
Download report Provides detailed error/warning data
View run details Opens additional details in a flyout

Run details

Run details are divided into:

  • Overview
    • Run start time
    • Run end time
    • Overall status
  • Troubleshooting
  • Statistics

Source data

Shows raw data counts before transformation or validation:

  • Organizations
  • Users
  • Classes
  • Enrollments

Transformed data

Shows data counts after transformation and advanced validation:

Metric Description
Error count Required fields are missing or invalid
Warning count Optional fields removed due to invalid values
Matched users SIS users linked to Microsoft Entra ID users
Unmatched users SIS users with active roles but no match

Stages view

The Stages tab displays the sequence of steps used to process data during the sync run:

  • Connected Data (institution data)
  • Managed Data provision types, including:
    • Microsoft 365 users
    • Microsoft 365 groups (class groups)
    • Microsoft 365 administrative units and security groups

Stage status values include:

Status Description
Completed No errors or warnings
Completed with errors Errors occurred
Completed with warnings Only warnings occurred
Failed Stage canceled

Advanced validation rules

During processing, records pass through advanced validation to ensure data integrity. SDS checks:

  • Data format
  • Required fields
  • Cross-record relationships
  • List of Values (enums)
  • Identity-matching rules
  • Type validation

SDS validates field values against seven main data types:

Data Type Validation Rules
Unique ID - Must be globally unique
- Case sensitive
- Stored as received
List of Values (enums) - Validates against predefined or custom codes
- Case-insensitive matching
- Stored as the normalized code value
String - Letters and numbers, typically up to 255 characters
- Case sensitive
- Stored as received
Email - Must follow RFC 5532 formatting
- Validates structure, not existence
- Stored in lowercase
Date - Must follow ISO 8601 (YYYYMMDD)
- Stored in ISO 8601 format
Phone - Must follow E.164 (+CountryCodeAreaCodeNumber)
- Case sensitive and stored as received
Boolean - Must be true or false (not 1/0)
- Case insensitive; stored as lowercase

Records that fail these rules are flagged as errors or warnings and excluded from the SDS cache.

More details are available in the SDS documentation under Health Monitoring, Troubleshooting, and Statistics, or at aka.ms/SDSValidationRules.

Investigate data issues with the validation report

After a sync run, admins can begin investigating issues and correcting data in the source system.

Screenshot showing the School Data Sync health page with option to Download copy of cache data.

To investigate flagged data:

  1. Open Sync Health or Run details.
  2. Select Download report.

The report is a comma separated values (CSV) file:

  • Row 1: Header row
  • Subsequent rows: Records that didn't pass validation

To help you understand the format, let's walk through the key columns.

Screenshot of a sample sync health validation report.

Column Description
Rule Describes the validation rule that failed for the record. Example: Indicates that a user record from the SIS didn't match any Microsoft Entra ID user based on the configured identity matching rules.
ExternalIdentifier The sourced ID (external ID) of the related entity from the SIS. This is treated as a Unique ID data type. Example: 114009. You can use this value to locate the record in users.csv or via the user's endpoint from the SIS.
Severity Indicates how serious the issue is:
- ERROR: Required data failed validation
- WARNING: Optional data failed validation
Example: WARNING. The validation rule flagged optional data but allowed the record to proceed.
EntityCode Identifies the data area related to the flagged record, like:
- User/Person
- Organization
- Enrollment
Example: A user record where the supplied value is being used for Microsoft Entra ID matching.
FriendlyMessage Provides human-readable context for the issue. It typically includes:
- The external ID
- The value that failed validation
- A short explanation of what went wrong
Example: The user record with sourced ID 114009 and username demiller@contoso.edu didn't match a Microsoft Entra ID user based on the current identity rules.
Possible causes:
- The SIS has an incorrect username or email value.
- The corresponding user hasn't yet been created in Microsoft Entra ID (for example, AD sync hasn't completed).

In the first case, fix the SIS data before the next run.

In the second case, confirm that directory sync is working; no change might be needed if the user will appear before the next SDS run.

Additional metadata columns

The report also includes platform metadata to help you investigate:

  • FlowName: The flow that ran the validation rule
  • SourceSystemName: The source system from Connect Data (for example, Contoso SIS)
  • Time: When the record was flagged during processing (UTC time)

When you're reviewing errors and warnings:

  1. Prioritize user record errors first: Identity issues often cascade into other errors (organization, roster, membership).
  2. Then focus on entities with the highest error counts. For example, many errors for the same EntityCode (like User or Enrollment).

For user identity errors:

  • Check the SIS fields used for identity rules (username/email/ID).
  • Confirm that Microsoft Entra ID has the expected User Principal Name (UPN) or Mail value.

If a user record fails, any related memberships or associations can also fail validation. Fixing the primary user data typically resolves many downstream errors like:

  • Membership references where the user or class doesn't exist
  • Enrollment issues where the referenced section or user is missing

Data hygiene and SIS quality

Good data hygiene is critical for a successful SDS deployment.

Suggestions for best practices:

  • Understand how your SIS manages fields like username and email.
  • Compare SIS data to Microsoft Entra ID data for consistency.
  • Consider piloting SDS with a smaller subset of schools or classes to assess data quality.
  • When errors occur, review SIS data first; many issues originate from inaccurate or incomplete SIS records.
  • After you gain confidence in your data quality, configure identity rules carefully to align with real-world conventions.

Sync Health on the Home dashboard

When a sync run completes, SDS provides summary statistics on the Home dashboard (Sync Health card):

  • The Home dashboard (Sync Health card)
  • The Sync Health page

Sync Health card

Shows the status of the latest sync run:

Metric Description
Running The run is in progress
Completed No errors or warnings
Completed with errors Errors were found
Completed with warnings Only warnings were found
Failed The run was canceled or encountered a critical issue
Error counts Number of records that failed data validation and were excluded from the cache
Warning counts Number of records in which optional fields failed data validation but the record was still processed

Institution statistics

The Institution data insights card shows counts for active data that passed validation:

Metric Description
Organizations Organizations that have active user roles
Users Users with an active user role associated with an organization
Classes Classes with active user enrollments
Enrollments Active enrollment records for active classes

The User insights card shows:

  • Source users: Users with an active role associated with an organization
  • Mapped users: Users successfully matched to Microsoft Entra ID accounts

Microsoft 365 group and IT group statistics

On the Home dashboard, SDS also shows group statistics:

  • Managed groups in Microsoft Entra ID: Number of groups that SDS is actively managing in the current session

Hover over the chart to see a breakdown by:

  • Class groups
  • Security groups
  • Administrative units

This report helps you to quickly understand how SDS-managed groups are distributed across your Microsoft 365 tenant.