Programatically Get ADF pipeline consumption

Everton Oliveira 96 Reputation points
2021-06-17T09:29:49.73+00:00

Hi,

I'm interested in querying the pipeline consumption report that is available from the Data Factory monitor. Is there a table on Log Analytics or PowerShell cmdlet that would return this information? My goal is to aggregate the information available in this report to identify what are the most costly pipelines.

106545-2021-06-17-10-22-20.png

ref: https://techcommunity.microsoft.com/t5/azure-data-factory/new-adf-pipeline-consumption-report/ba-p/1394671

Thank you

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,604 questions
0 comments No comments
{count} votes

Accepted answer
  1. Everton Oliveira 96 Reputation points
    2021-06-21T09:51:08.08+00:00

    Thanks @KranthiPakala-MSFT , below is my version of the script which is more aligned with what I was looking for.

    $startTime = "21/6/2021 7:00:00"  
    $endTime = "21/6/2021 10:00:00"  
    $adf = '<data factory name>'  
    $rg = '<resrouce group name>'  
          
      
    $outputObj = @()  
    $pipelineRuns = Get-AzDataFactoryV2PipelineRun -ResourceGroupName $rg -DataFactoryName $adf -LastUpdatedAfter $startTime -LastUpdatedBefore $endTime  
      
    # loop through all pipelines and child activities to return billable information  
    foreach ($pipelineRun in $pipelineRuns) {  
        $activtiyRuns = Get-AzDataFactoryV2ActivityRun -ResourceGroupName $rg -DataFactoryName $adf -pipelineRunId $pipelineRun.RunId -RunStartedAfter $startTime -RunStartedBefore $endTime  
      
        foreach ($activtiyRun in $activtiyRuns) {  
            if ($null -ne $activtiyRun.Output -and $null -ne $activtiyRun.Output.SelectToken("billingReference.billableDuration")) {              
                  
                $obj = @()  
                $obj = $activtiyRun.Output.SelectToken("billingReference.billableDuration").ToString() | ConvertFrom-Json  
                $obj | Add-Member -MemberType NoteProperty -Name activityType -value $activtiyRun.Output.SelectToken("billingReference.activityType").ToString()  
                $obj | Add-Member -MemberType NoteProperty -Name pipelineName -value $pipelineRun.PipelineName  
                $obj | Add-Member -MemberType NoteProperty -Name activtiyRuns -value $activtiyRuns.Count               
      
                $outputObj += $obj  
            }  
            else {}  
        }  
    }  
      
    # output aggregated result set as table  
    $groupedObj = $outputObj | Group-Object -Property pipelineName, activityType, meterType  
    $groupedObj | ForEach-Object {  
        $value = $_.name -split ', '  
        New-Object psobject -Property @{   
                                     
            activityType              = $value[1];  
            meterType                 = $value[2];  
            pipelineName              = $value[0];  
            executionHours            = [math]::Round(($_.Group | Measure-object -Property duration -sum).Sum, 4)  
            orchestrationActivityRuns = $groupedObj.group.activtiyRuns[0]  
        }   
    } | Sort-Object -Property meterType | Format-Table  
    

    Output sample

    107612-image.png

    Consumption report from the Data Factory monitor

    107510-image.png


2 additional answers

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,422 Reputation points Microsoft Employee
    2021-06-17T23:39:25.217+00:00

    Hi @Everton Oliveira ,

    Welcome to Microsoft Q&A forum and thanks for your query.

    ADF product team has recently published a sample PowerShell script in GitHub samples which can be used to print activity level run details in 45 day range.

    This information can be useful for users who want to know which activity ran when and for how many hours and corehours. This data can be used:

    • To understand which activities ran for how long and corehours behind it.
    • To build future consumption models from past runs.
    • To budget future spend because corehours can be priced from data in pricing calculator.

    Here is the link for the GitHub sample: Simple script that prints activity level run details in 45 day range

    Sample script (please update with variable details accordingly):

    $startTime = "5/5/2021 4:00:00"  
    $endTime = "5/5/2021 7:00:00"  
       
    $pipelineRuns = Get-AzDataFactoryV2PipelineRun -ResourceGroupName ADFRG -DataFactoryName adfcontoso -LastUpdatedAfter $startTime -LastUpdatedBefore $endTime  
       
    foreach($pipelineRun in $pipelineRuns) {  
        $activtiyRuns = Get-AzDataFactoryV2ActivityRun -ResourceGroupName ADFRG -DataFactoryName adfcontoso -pipelineRunId $pipelineRun.RunId -RunStartedAfter $startTime -RunStartedBefore $endTime  
        foreach($activtiyRun in $activtiyRuns) {  
            if ($activtiyRun.Output -ne $null -and  
                    $activtiyRun.Output.SelectToken("billingReference.billableDuration") -ne $null) {  
                Write-Output $activtiyRun.Output.SelectToken("billingReference.billableDuration").ToString() for $activtiyRun.Output.SelectToken("billingReference.activityType").ToString()  
            }  
            else {  
                Write-Output "Not Availble" for $activtiyRun.ActivityType  
            }  
        }  
    }  
    

    Sample output:

    # Here is sample output. Azure IR is used with General Compute meter. It shows corehours and hours spent.   
    <# ExternalActivity  
    [  
      {  
        "meterType": "General",  
        "duration": 0.28871840533333332,  
        "unit": "coreHour",  
        "sessionType": "WarmCluster"  
      }  
    ]  
    for  
    executedataflow  
    [  
      {  
        "meterType": "AzureIR",  
        "duration": 0.016666666666666666,  
        "unit": "Hours"  
      }  
    ]  
    for  
    ExternalActivity   
    #>  
    

    Hope this information helps to give you some pointers. Do let us know how it goes.

    ----------

    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

  2. Jakub Skibicki 1 Reputation point
    2022-01-19T13:22:54.077+00:00

    @Everton Oliveira

    I took you script but did not like two items:

    1. Due to if condition all activities that to not have an output or MeterType are excluded. Which is wrong because all such activities, like loops, waits, actually incur regular cost for activity orchestration with Azure Integration Runtime. if ($null -ne $activtiyRun.Output -and $null -ne $activtiyRun.Output.SelectToken("billingReference.billableDuration")) {
    2. Your calculation of activity runs is also incorrect. For each activity run (single item) it appends a total of activites in pipeline run ($activtiyRuns.Count).

    Hence I propose such version

    Connect-AzAccount
    
    $startTime = &#34;2022-01-19 00:00:01&#34;
    $endTime = &#34;2022-01-19 23:59:59&#34;
    $adf = &#39;yourADF&#39;
    $rg = &#39;ADFsResourceGroup&#39;
            
    
    #Get-AzDataFactoryV2PipelineRun -ResourceGroupName $rg -DataFactoryName $adf -LastUpdatedAfter $startTime -LastUpdatedBefore $endTime
        
     $outputObj = @()
     $pipelineRuns = Get-AzDataFactoryV2PipelineRun -ResourceGroupName $rg -DataFactoryName $adf -LastUpdatedAfter $startTime -LastUpdatedBefore $endTime
     
     $one = 1
     $otherActivity = &#34;OtherActivity&#34;
     $zero = 0
     
     $toAdd =@&#34;
    {
      &#34;meterType&#34;: &#34;AzureIR&#34;
      ,&#34;duration&#34;: 0
      ,&#34;Unit&#34;:&#34;Hours&#34;
    }
    &#34;@
        
     # loop through all pipelines and child activities to return billable information
     foreach ($pipelineRun in $pipelineRuns) {
         $activtiyRuns = Get-AzDataFactoryV2ActivityRun -ResourceGroupName $rg -DataFactoryName $adf -pipelineRunId $pipelineRun.RunId -RunStartedAfter $startTime -RunStartedBefore $endTime
        
         foreach ($activtiyRun in $activtiyRuns) {
             if ($null -ne $activtiyRun.Output -and $null -ne $activtiyRun.Output.SelectToken(&#34;billingReference.billableDuration&#34;)) {
                 $obj1 = @()
                 $obj1 = $activtiyRun.Output.SelectToken(&#34;billingReference.billableDuration&#34;).ToString() | ConvertFrom-Json     
                 $obj1 | Add-Member -MemberType NoteProperty -Name activityType -value $activtiyRun.Output.SelectToken(&#34;billingReference.activityType&#34;).ToString()
                 $obj1 | Add-Member -MemberType NoteProperty -Name pipelineName -value $pipelineRun.PipelineName
                 $obj1 | Add-Member -MemberType NoteProperty -Name activtiyRuns -value $one
                 $obj1 | Add-Member -MemberType NoteProperty -Name PipelineRunId -value $activtiyRun.PipelineRunId 
                 $obj1 | Add-Member -MemberType NoteProperty -Name ActivityRunId -value $activtiyRun.ActivityRunId
                 $obj1 | Add-Member -MemberType NoteProperty -Name ActivityRunName -value $activtiyRun.ActivityName        
                 
                 $outputObj += $obj1   
        
             }
             else {
                $obj2 = @()
                $obj2 = (ConvertFrom-Json -InputObject $toAdd)
                $obj2 | Add-Member -MemberType NoteProperty -Name activityType -value $otherActivity
                $obj2 | Add-Member -MemberType NoteProperty -Name pipelineName -value $pipelineRun.PipelineName
                $obj2 | Add-Member -MemberType NoteProperty -Name activtiyRuns -value $one
                $obj2 | Add-Member -MemberType NoteProperty -Name PipelineRunId -value $activtiyRun.PipelineRunId 
                $obj2 | Add-Member -MemberType NoteProperty -Name ActivityRunId -value $activtiyRun.ActivityRunId
                $obj2 | Add-Member -MemberType NoteProperty -Name ActivityRunName -value $activtiyRun.ActivityName    
                
                $outputObj += $obj2     
             }
    
    
    
         }
     }  
    
     # output aggregated result set as table
     $groupedObj = $outputObj | Group-Object -Property pipelineName, activityType, meterType
     $groupedObj | ForEach-Object {
         $value = $_.name -split &#39;, &#39;
         New-Object psobject -Property @{ 
                                       
             activityType              = $value[1];
             meterType                 = $value[2];
             pipelineName              = $value[0];
             executionHours            = [math]::Round(($_.Group | Measure-object -Property duration -sum).Sum, 4)
             orchestrationActivityRuns = ($_.Group | Measure-object -Property activtiyRuns -Sum).Sum 
         } 
     } | Sort-Object -Property meterType | Format-Table
    
    0 comments No comments