FIM 2010 R2 Reporting ETL Process

FIM 2010 R2 Reporting ETL Processes

Extract, transform and load (ETL) is a process in data warehousing. With regard to FIM 2010 R2 Reporting the process involves extracting data from the FIM Service database, transforming it to our Data Warehouse schema, and then loading it into the data warehouse. The process of moving the data between FIM and the SCSM server occurs on both the FIM Server and the SCSM server. The following sections will identify the processes that occur on the FIM Server and those that occur on the SCSM server.

FIM 2010 R2 Specific ETL processes

The processes that occur on the FIM 2010 R2 server with regard to moving data between FIM and the SCSM Data Warehouse revolve around staging the data for export to the SCSM Data Warehouse. FIM 2010 R2 Reporting has the following 3 different processes that stage data for the SCSM server. These ETL processes:

  • Initial ETL – The initial ETL is used to perform a full upsert of all dimensional values persisted in the data warehouse. The set of objects attributes to be pushed will be limited to the objects identified by the mapping between FIM and the Data Warehouse. It may take a significant amount of time to complete execution. The initial ETL must run at the inception of the Data Warehouse store. This process is launched by using the Start-FIMReportingInitialSync.ps1 Powershell script.

  • Initial Partial ETL – The Initial Partial ETL process is used in a situation where additional FIM objects and attributes are added. This process loads information to the data warehouse based on the new object and attribute mappings. It is important to note that that Incremental ETL may continue to run before the Initial Partial ETL process can since Incremental will start flowing over the new objects and attributes. A subsequent Initial Partial run will flow the remaining data. This process is launched by using the Start-FIMReportingInitialPartialSync.ps1 Powershell script.

  • Incremental ETL – The Incremental ETL process pushes changes in the FIM Service database to the data warehouse. A watermark is used to determine the changes. The Incremental ETL process will compare watermarks from the FIM Service and the Data Warehouse. If there is a failure in processing a batch, the process will retry that batch once before stopping the ETL process. It is important to note that if the process encounters an error, it will leave the watermark untouched and the next Incremental ETL run will replay the batch. Writes to dimensional data in the data warehouse are idempotent. Writes to fact data is not idempotent. This results in duplicate change records in the data warehouse. This process is launched by using the Start-FIMReportingIncrementalSync.ps1 Powershell script.

The invocation of the ETL processes are managed by ReportingJob objects in FIM. The act of creating a ReportingJob indicates to FIM to start an ETL process. Only one ETL process should be run at a time. To ensure this, before processing the request to create a ReportingJob, FIM will determine whether there are active jobs running. If there are, then the request is rejected.

Each ReportingJob object contains a status attribute. The status attribute is updated as the ETL process moves data to the data warehouse. The status can be viewed on the Extended Attributes of the ReportingJob.

Check Initial Status

SCSM Specific ETL processes

Export, Transform, Load – (ETL) process that runs on the SCSM server takes that data that was exported during the processes above and loads it into the Data Warehouse. This process runs at a specific interval on the SCSM Data Warehouse. This process can be kicked off by using a script similar to the one below. This allows you to see the data immediately in the reporting store. For additional information about managing the data warehouse see Managing the Data Warehouse in Service Manager (https://technet.microsoft.com/en-us/library/ff460931.aspx).

Note

You can speed up your initial reporting load process by running this script in a loop while FIM Initial ETL is running.

if (@(get-pssnapin | where-object {$_.Name -eq "SMCmdletSnapIn"} ).count -eq 0)
{
    Add-PSSnapin SMCmdletSnapIn
}

$dwMachine = [environment]::GetEnvironmentVariable("IMT.DataWarehouse")

Function WaitForId($id)
{
Write-Host ("Waiting on the job " + $id)
    do
    {
        $job = Get-SCDWJob -ComputerName $dwMachine -JobBatchId $id
        Start-Sleep -milliseconds 5000
        Write-Host (".") -nonewline
    }
    while ($job.EndTime -eq $null)
    Write-Host ("Job " + $id + " is done")
}

Function FindId($jobName)
{
    $job = Get-SCDWJob -ComputerName $dwMachine -JobName $jobName
    if($job.Status -eq "Running")
    {
        return $job.BatchId
    }
    else
    {
        return $null
    }
}

Function GetExtractJobNames
{
    $results = New-Object System.Collections.ArrayList
    $jobs = Get-SCDWJob -ComputerName $dwMachine 
    foreach ($job in $jobs)
    {
        if($job.CategoryName -eq "Extract")
        {
            [void]$results.Add($job.Name)
        }
    }
    return $results;
}

Function RunJob($jobName)
{
    $currentId = FindId($jobName)
    if($currentId -ne $null)
    {
        Write-Host ("Waiting for the previous job " + $jobName)
        WaitForId($currentId)
    }
    Write-Host ("Starting the new job " + $jobName)
    
    Start-SCDWJob -ComputerName $dwMachine -JobName $jobName
    
    $currentId = FindId($jobName)
    if($currentId -ne $null)
    {
        Write-Host ("Waiting for the new job " + $jobName)
        WaitForId($currentId)
    }
}

Function RunETL()
{
    $extractJobs = GetExtractJobNames
    foreach($jobName in $extractJobs)
    {
        RunJob $jobName
    }
    RunJob "Transform.Common"
    RunJob "Load.Common"
}

if ( $dwMachine -eq $null)
{
Write-Error ( "IMT.DataWarehouse was not set ")
}
else
{
$start = Get-Date
RunETL
$end = Get-Date
$ts = New-TimeSpan -Start $start -End $end
Write-Host ("Took " + $ts.TotalMinutes + " total minutes for ETL")
}