Azure data factory

Anandazure 46 Reputation points
2021-10-08T11:59:51.617+00:00

How to clean up adf pipelines and how to find unused pipelines in adf

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,357 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,514 questions
{count} votes

1 answer

Sort by: Most helpful
  1. svijay-MSFT 5,201 Reputation points Microsoft Employee
    2021-10-11T14:11:46.647+00:00

    Hello @Anandazure ,

    Thanks for the question and using MS Q&A platform.

    From my research, there is no direct approach to meet the requirement. However, I can think of the below alternative.

    • Get the factory lists ( either at a subscription or from rg)
    • For-each Factory get all the pipelines
    • For each pipeline, get the pipeline run associated for certain (period say 45days).
    • If the pipeline run is empty - you could consider the pipeline has not been run for the past 45 days.

    Here is a sample snippet I wrote in the PS. Note : This is for demonstration. But you could also use SDK to achieve the above requirement.

       Connect-AzAccount  
    Set-AzContext -Subscription <SUBSCRIPTION>  
      
    #getting all the factories  
    $factories =  Get-AzDataFactoryV2  #  -ResourceGroupName '<RG>'  
      
      
    #initiating a blank array   
    $UnusedPipelines = @()  
      
    #iterating through each factory  
    foreach ($factory in $factories)  
    {  
      
     Write-Host "Working on " $factory.DataFactoryName -ForegroundColor Green  
     $pipelines = Get-AzDataFactoryV2Pipeline -DataFactory $factory  
        foreach ($pipeline in $pipelines)  
          {  
          Write-Host "Checking the pipeline :  " $pipeline.Name  
          $pipelinerun = Get-AzDataFactoryV2PipelineRun -PipelineName $pipeline.Name -DataFactory $factory -LastUpdatedAfter (Get-date).AddDays(-45) -LastUpdatedBefore (Get-Date)   
                   
         #if the pipeline has not been run in 45 days.   
          if($pipelinerun -eq $null)  
          {  
          $UnusedPipelines  += $pipeline  
          Write-Host "not Run in 45 Days"  
          }  
      
          else  
          {  
          Write-Host ("Run in 45 Days " +  $pipeline.Name) -ForegroundColor Green  
          }  
          }  
          
    }  
    

    Output :
    139406-image.png

    You could check the output of the $unusedpipelines and exercise the below script to remove pipelines.

    foreach($unusedpipeline in $unusedpipelines)  
    {  
    Remove-AzDataFactoryV2Pipeline -Name $unusedpipeline.Name -DataFactoryName $unusedpipeline.DataFactoryName -ResourceGroupName $unusedpipeline.ResourceGroupName  
    }  
    

    This will result in a prompt for every removal. If you don't want this you could use the -Force switch

    NOTE :

    • You will not be able to get the data before the 45 days (this is a limit at the service end. Pipeline run information is stored beyond 45 days). So if there, is a pipeline runs that once 60 days - there is a possibility that it might get included to the unused list.

    139523-image.png

    • New pipeline runs might not have any pipeline runs - this will also get included in the unused list.

    To handle both of this, you could run the above script in intervals (say once every 50 days)- store and compare the outputs of successive runs - if a specific pipeline from a data factory comes in two successive runs - you can choose to delete only them.

    To handle only the second part New pipeline runs - actively worked pipelines may have debug runs. There is currently no commandlet for getting the debugs runs however, you could invoke a REST API hitting the endpoint.

    https://management.azure.com/subscriptions/<subscription>/resourcegroups/<RG>/providers/Microsoft.DataFactory/factories/<factoryname>/querydebugpipelineruns?api-version=2018-06-01

    See whether there is any debug runs. If no, you could safely delete the pipeline.

    But this will still not handle scenarios of newly created pipelines that don't have any debug runs.

    • Also from my testing, I see the pipeline runs of only published branch (CI/CD) [ DevBranch Pipelines don't reflect] - I would request you to double check the same at your end and confirm.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators