Share via


Data Lake Schema - Rostering

The School Data Sync (SDS) data lake contains the internal representation of the data that's imported from the institution’s Student Information System (SIS)/Student Management System (SMS). The exported data is slightly different than the data that's imported as the data is normalized and stored longitudinally. The longitudinal aspect of the lake is used to assist with providing analytics over users, classes, and data that may no longer be active in Microsoft 365.

In many cases, the data is enhanced with a FirstSeenDateTime and a LastSeenDateTime. SDS doesn't know when data is added/removed from the external SIS, it simply knows when it first or last saw the data.

FirstSeenDateTime is the time the record was first seen by SDS and the creation data of that row in the table. This does NOT mean it's the creation date of the record in the SIS / SMS as SDS might have run well after the data was added to the external system.

In a similar fashion, LastSeenDateTime is the moment SDS last saw the data in a sync. This isn't the deletion date of the record from the SIS / SMS, but the date (time) of the last time SDS saw the record during a sync.

The Related object column below indicates if this ID is a foreign key to a different table.

Domain: System

System tables represent data that SDS uses to hold together the system. SourceSystem indicates which external system the data is coming from. RefDefinition is the List of Values (ENUMS) used throughout the system. Our Enums are used for items like Org Type or User Roles. In a future version of SDS, customers will be able to augment the Enums used in the system. In a future version of SDS, RefTranslation will allow IT to give translated text for Enums that are added to the system.

Export type: SourceSystem

Column Name Column Type Related Object Description
ID String
Name String
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset

Export type: RefDefinition

Know List of Values (LoVs) or ENUMS. Used to validate incoming data that are being associated with the default values. See Default List of Values.

Column Name Column Type Related Object Description
ID String
Namespace String Namespace of the entity who created the RefDefinition. Managed by Microsoft.
RefType String LoV / ENUM name. RefTypes of the same name represent the same LoV’s | ENUMs.
Code String Values for the LoV / ENUM of the RefType
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset
SortOrder Int32 Allows the LoV /ENUM to be sorted in the UX or for Organizational Roles to help define priority order.

Export type: RefTranslation

Currently not used. May be used for customer supplied LoV values.

Column Name Column Type Related Object Description
ID String RefDefinition.Id
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset
Language String Language code of the translated string.
LocalizedName String Allows the Code from RefDefinition to have a display string, which can be translated into different languages.

Domain: Time

Sessions represent time in the system. Roles, Sections, and other key objects MUST have a session to identify when objects are available in the system. Also see Academic Year Handling for additional information on the association and handling of data that is tied to an academic year.

Export type: Session

Column Name Column Type Related Object Description
ID String
SourceSystemId String SourceSystem.Id System this Time definition came from.
ExternalId String
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset
Name String
RefSessionTypeId String RefDefinition.Id Session Type LoV
RefAcademicYearId String RefDefinition.Id School Year LoV - YYYY
StartDate String String in ISO 6801 Date format.
EndDate String String in ISO 6801 Date format.
ParentSessionId String Allows a hierarchy of Sessions including terms and semester.

Domain: Organizations

Organizations represent institution entities. Organizations aren't linked to a session. Only the person's association to an organization and role is linked to a session.

Export type: Organization

Column Name Column Type Related Object Description
ID String
SourceSystemId String SourceSystem.Id System, which the Organization came from.
ExternalId String
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset
Name String
Identifier String Example. NCES ID for the organization.
RefOrganizationTypeId String RefDefinition.Id Organization Type LoV
ParentOrganizationId String Allows a hierarchy of Organizations.

Domain: People

People represent person records. These aren't linked to a session. Only the person's association to an organization and role is linked to a session.

Export type: Person

Column Name Column Type Related Object Description
ID String
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset
Surname String If not supplied can join to Microsoft Entra users where available.
GivenName String If not supplied can join to Microsoft Entra users where available.
MiddleName String If not supplied can join to Microsoft Entra users where available.
PreferredSurname String
PreferredGivenName String
PreferredMiddleName String

Export type: PersonRelationship

PersonRelathionship indicates a relationship between two people. The relationship is stored in one direction. For instance, if the relationship contact is parent/guardian, the child will be represented by the PersonId column while the contact will be represented in the RelatedPersonId column.

Column Name Column Type Related Object Description
ID String
PersonId String Person.Id
RelatedPersonId String Person.Id
RefPersonRelationshipId String RefDefinition.Id Relationship Role LoV value
FirstSeenDateTime DateTimeOffset

Export type: PersonIdentifier

Person can have multiple identifiers from multiple systems. The identifier from the SIS/SMS, Microsoft Entra ID, etc. are all stored in this table.

Column Name Column Type Related Object Description
ID String From the source system or generated.
PersonId String PersonId Person, which the identifier relates to.
SourceSystemId String SourceSystem.Id System, which the identifer came from.
RefIdentifierTypeId String RefDefintion.Id Identifier ENUM
Identifier String Value of the Identifier
FirstSeenDateTime DateTimeOffSet
IsPresentInSource Boolean Indicates whether the identifer is still present in its associated source data.

Export type: PersonEmailAddress

PersonEmailAddress contains the last updated set of information for a given Person.

Column Name Column Type Related Object Description
ID String
PersonId String PersonId Person who the mail belongs to.
EmailAddress String
PriorityOrder Int32
RefEmailAddressTypeId String RefDefinition.Id Email Address ENUM
FirstSeenDateTime DateTimeOffset

Export type: PersonPhoneNumber

PersonPhoneNumber contains the last updated set of information for a given Person.

Column Name Column Type Related Object Description
ID String
PersonId String PersonId Person who is related to phone number
PhoneNumber String
PriorityOrder Int32
RefPhoneNumberTypeId String Refdefinition.Id Phone Number type ENUM
FirstSeenDateTime DateTimeOffset

Export type: PersonDemographic

PersonDemographic lifetime is tied to the updated cadence of Person; once Person is no longer being updated, no changes will happen to PersonDemographic or any downstream table.

Column Name Column Type Related Object Description
PersonId String PersonId Person whose record belongs to
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset
RefSexId String RefDefinition.Id Demographic Gender LoV
BirthDate String Refdefinition.Id String in ISO 8601 Date format
BirthCity String
BirthState String Values are US-specific.
BirthCountyCode String Not linked to LoV

Export type: PersonDemographicEthnicity

PersonDemographicEthnicity contains the last updated set of information for a given Person.

Column Name Column Type Related Object Description
ID String
PersonId String PersonId Person who this record belongs to
RefEthnicityId String RefDefinition.Id Demographic Ethnicity LoV value
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset

Export type: PersonDemographicRace

PersonDemographicRace contains the last updated set of information for a given Person.

Column Name Column Type Related Object Description
ID String
PersonId String PersonId Person who this record belongs to
RefRaceId String RefDefinition.Id Demographic Race LoV value
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset

Export type: PersonDemographicPersonFlag

PersonDemographicPersonFlag contains the last updated set of information for a given Person.

Column Name Column Type Related Object Description
ID String
PersonId String PersonId Person who this record belongs to
RefPersonFlagId String RefDefinition.Id User Flag LoV value
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset

Domain: Academic Groups

Export type: Section

Column Name Column Type Related Object Description
ID String
SourceSystemId String

SourceSystem.Id
System, which the section came from.
ExternalId String
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset
Name String
OrganizatonId String Organization.Id Organization, which this section belongs to.
CourseId String Course.Id Course that is associated to this section.
Code String
Location String

ExportType: SectionSession

Intersection table between Section and Session. The relationship is many to many.

Column Name Column Type Related Object Description
ID String
SectionId String Section.Id Section of the relationship
SessionId String Session.Id Session of the relationship
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset
IsActiveInSession Boolean Identifies whether the Section is still active in the current Session. If it's no longer synchronized from the Source, it will be FALSE.

Export type: SectionGradeLevel

SectionGradeLevel only contains the last read version of a given subject.

Column Name Column Type Related Object Description
ID String
SectionId String Section.Id
RefGradeLevelId String RefDefinition.Id Grade level LoV value
FirstSeenDateTime DateTimeOffset

Export type: SectionSubject

SectionSubject only contains the last-provided set of subjects for a given section.

Column Name Column Type Related Object Description
ID String
SectionId String Section.Id Section, which this subject relates to.
RefAcademicSubjectId String RefDefinition.Id Subject LoV value
FirstSeenDateTime DateTimeOffset

Export type: Course

Column Name Column Type Related Object Description
ID String
SourceSystemId String SourceSystem.Id System, which the Course came from.
ExternalId String
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffeset
Name String
OrganizationId String Organization.Id Organization, which the Course is linked to.
IsActiveInSession Boolean Indicates whether the Course is still active in the current Session. If it's no longer synchronized from the Source, it will be FALSE.
Code String
AcademicYearSessionId String Session.Id Session.Id of the session of type ‘schoolYear’ this course is associated with.

Export type: CourseGradeLevel

CourseGradeLevel contains the most-recently sourced set of data for a Course.

Column Name Column Type Related Object Description
ID String
CourseId String Course.Id Course, which this grade level relates to
RefGradeLevelId String RefDefinition.Id Grade Level LoV value
FirstSeenDateTime DateTimeOffset

Export type: CourseSubject

CourseSubject contains the most-recently sourced set of data for a Course.

Column Name Column Type Related Object Description
ID String
CourseId String Course.Id Course, which this subject belongs to
RefAcademicSubjectId String RefDefinition.Id Subject LoV value
FirstSeenDateTime DateTimeOffset

Domain: Affiliations

Export type: PersonOrganizationRole

The relationship between a Person, Organization and Session (time).

Column Name Column Type Related Object Description
ID String
SourceSystemId String SourceSystem.Id System, which the organization affiliation came from.
ExternalId String
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset
OrganizationId String Organization.Id Organization this affiliation belongs to
PersonId String Person.Id Person who this affiliation belongs to
RefRoleId String RefDefinition.Id Organization Role LoV value
SessionId String Session.Id Session for which this affiliation is valid
IsActiveInSession Boolean Defines whether the affiliation is still active in the current Session. If it's no longer synchronized from the Source, it will be FALSE
RoleStartDate String Date Org and Role affiliation began value that may be sent from the Source for this Session (not the date the record first seen).
RoleEndDate String Date Org and Role affiliation stopped value that may be sent from the Source for this Session (not the date the record was last seen).
IsPrimary Boolean Defines if this Org and Role is the primary affiliation that may be sent from the Source.
RefGradeLevelId String RefDefinition.Id Grade Level LoV value

Export type: Enrollment

The relationship between a Person and a Section. Time (what is current right now) is also important and is determined by the Section Session relationship. The presence of SectionSession table indicates that enrollments span possibly non-contiguous Sessions.

Column Name Column Type Related Object Description
ID String
SourceSystemId String SourceSystem.Id System, which the organization affiliation came from.
ExternalId String
FirstSeenDateTime DateTimeOffset
LastSeenDateTime DateTimeOffset
PersonId String Person.Id Person who this affiliation belongs to
SectionId String Section.Id Section, which this affiliation belongs to
RefSectionRoleId String RefDefinition.Id Organization Role LoV value
IsActiveInSession Boolean Defines whether the affiliation is still active in the current Session. If it's no longer synchronized from the Source it will be FALSE
IsPrimaryStaffForSection Boolean Defines if this Org and Role is the primary affiliation that may be sent from the Source.
EntryDate String Date Org and Role affiliation began value that may be sent from the Source for this Session (not the date the record first seen).
ExitDate String Date Org and Role affiliation stopped value that may be sent from the Source for this Session (not the date the record was last seen).