This post explains things that are difficult to find even in English. That’s why I will break my rule and will not write it in my native language! Po wersję polską zapraszam do google translate :>

Table of Contents

Introduction

Azure Automation is just a PowerShell and python running platform in the cloud.

In marketing language, it’s a swiss army knife 😛

Here how Microsoft describes it:

“Azure Automation delivers a cloud-based automation and configuration service that provides consistent management across your Azure and non-Azure environments. It consists of process automation, update management, and configuration features. Azure Automation provides complete control during deployment, operations, and decommissioning of workloads and resources.“

Apart from this gibberish, I will point out some important issues…

Know your Automation

It has something that is called “a feature” – Fair Share – which basically prevent you from running scripts longer than 3 hours.
Well, at least it will pause your script after 3 hours. And if you didn’t implement it as a workflow with some checkpoints – it will RESTART your script from the beginning.
And if you implement checkpoints, it will resume your script from last known checkpoint. BUT it will do this only 3 times! So you are not able to implement logic that takes more than 9 hours to process…
The workaround is to connect your own machine (server or laptop) as a hybrid worker.

Since Azure Data Factory cannot just simply pause and resume activity, we have to assume that pipeline will not run more than 3 hours.

Any other scenarios require you to write your custom logic and maybe divide pipelines to shorter ones and implement checkpoints between running them…

Preparations

Before we create runbook, we must set credential and some variables.

Adding credential

We have to set credential, that PowerShell will use to handle pipeline run in Azure Data Factory V2

Go to Automation account, under Shared Resources click “Credentials“
Add a credential. It must be an account with privileges to run and monitor a pipeline in ADF. I will name it “AzureDataFactoryUser”. Set login and password.

Adding variables

We will use variables to parametrize some account information just not to hardcode them in our script.

Go to Automation account, under Shared Resources click “Variables“
Add four string variables and set values for them. First will point to the credential name, second will provide data factory name, third – resource group name and fourth – ADF`s subscription id.

ADF-CredentialName
ADF-DataFactoryName
ADF-ResourceGroupName
ADF-SubscriptionID

ADF-CredentialName

ADF-DataFactoryName

ADF-ResourceGroupName

ADF-SubscriptionID

Adding AzureRM.DataFactoryV2 module

You have to add powershell module to your automation account. Justo go to Modules, click “Browse gallery” and search for “AzureRM.DataFactoryV2“.

Select it from the results list and click “Import“.

Creating the runbook

Now we can create a PowerShell runbook.

Bear in mind, that working with PowerShell in Azure Portal is not the best way to create, debug and test your runbooks. I really suggest using PowerShell ISE addon for Azure Automation. Go to https://azure.microsoft.com/en-us/blog/announcing-azure-automation-powershell-ise-add-on/ and see for yourself.

Go to Automation portal, under “PROCESS AUTOMATION” click “Runbooks“
Select “Add a runbook“
We will use quick create, so select “Create a new runbook“, then name it and select type as “PowerShell“.
Use the script below in “Edit” mode, then save it and publish.

PowerShell script

Parameters

It has two parameters:

PipelineName – the name of the pipeline to run

CheckLoopTime – a number of seconds between checking status of a trigerred pipeline run

Invoke-AzureRmDataFactoryV2Pipeline is a cmdlet which I use to trigger a pipeline. Unfortunately, it is an asynchronous operation, so after triggering, we have to periodically check for running state and status.

This script will do it in a simple loop and there will be some wait logic before every iteration. You can parametrize the number of seconds. Every loop also prints out last known pipeline status and timestamp of that check.

Any status other than “Succeeded” will be treated as a failure.

The code

param (
    [Parameter(Mandatory=$true)][string]$PipelineName,
    [Parameter(Mandatory=$true)][int]$CheckLoopTime = 60
 )

#Stop on any error!
$ErrorActionPreference = "Stop"

#Get and set configs
$CredentialName = Get-AutomationVariable -Name 'ADF-CredentialName'
$ResourceGroupName = Get-AutomationVariable -Name 'ADF-ResourceGroupName'
$DataFactoryName = Get-AutomationVariable -Name 'ADF-DataFactoryName'
$SubscriptionID = Get-AutomationVariable -Name 'ADF-SubscriptionID'


#Get credentials
$AzureDataFactoryUser = Get-AutomationPSCredential -Name $CredentialName

try{
    #Use credentials and choose subscription
    Add-AzureRmAccount -Credential $AzureDataFactoryUser | Out-Null
    Set-AzureRmContext -SubscriptionId $SubscriptionID | Out-Null


    # Get data factory object
    $df=Get-AzureRmDataFactoryV2 -ResourceGroupName $ResourceGroupName -Name $DataFactoryName

    #If exists - run it
    If($df) {
               Write-Output "Connected to data factory $DataFactoryName on $ResourceGroupName as $($AzureDataFactoryUser.UserName)"
               Write-Output "Running pipeline: $PipelineName"

               $RunID = Invoke-AzureRmDataFactoryV2Pipeline -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -PipelineName $PipelineName
               $RunInfo = Get-AzureRmDataFactoryV2PipelineRun -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -PipelineRunId $RunID

               Write-Output "`nPipeline triggered!"
               Write-Output "RunID: $($RunInfo.RunId)"
               Write-Output "Started: $($RunInfo.RunStart)`n"

               $sw =  [system.diagnostics.stopwatch]::StartNew()
               While (($Pipeline = Get-AzureRmDataFactoryV2PipelineRun -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -PipelineRunId $RunID |Select -ExpandProperty "Status") -eq "InProgress")
               {
                
                    #Write-Output 
                    $RunInfo = Get-AzureRmDataFactoryV2PipelineRun -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -PipelineRunId $RunID
                    Write-Output "`rLast status: $($RunInfo.Status) | Last updated: $($RunInfo.LastUpdated) | Running time: $($sw.Elapsed.ToString('dd\.hh\:mm\:ss'))" #-NoNewline
                    Start-Sleep $CheckLoopTime
               }
               $sw.Stop()

               $RunInfo = Get-AzureRmDataFactoryV2PipelineRun -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -PipelineRunId $RunID

               Write-Output "`nFinished running in $($sw.Elapsed.ToString('dd\.hh\:mm\:ss'))!"
               Write-Output "Status:"
               Write-Output $RunInfo.Status

               if ($RunInfo.Status -ne "Succeeded"){                    
                    throw "There was an error with running pipeline: $($RunInfo.PipelineName). Returned message was:`n$($RunInfo.Message)"
               }
    }
}
Catch{
    Throw
}

param (

[Parameter(Mandatory=$true)][string]$PipelineName,

[Parameter(Mandatory=$true)][int]$CheckLoopTime = 60

)

#Stop on any error!

$ErrorActionPreference = "Stop"

#Get and set configs

$CredentialName = Get-AutomationVariable -Name 'ADF-CredentialName'

$ResourceGroupName = Get-AutomationVariable -Name 'ADF-ResourceGroupName'

$DataFactoryName = Get-AutomationVariable -Name 'ADF-DataFactoryName'

$SubscriptionID = Get-AutomationVariable -Name 'ADF-SubscriptionID'

#Get credentials

$AzureDataFactoryUser = Get-AutomationPSCredential -Name $CredentialName

try{

#Use credentials and choose subscription

Add-AzureRmAccount -Credential $AzureDataFactoryUser | Out-Null

Set-AzureRmContext -SubscriptionId $SubscriptionID | Out-Null

# Get data factory object

$df=Get-AzureRmDataFactoryV2 -ResourceGroupName $ResourceGroupName -Name $DataFactoryName

#If exists - run it

If($df) {

Write-Output "Connected to data factory $DataFactoryName on $ResourceGroupName as $($AzureDataFactoryUser.UserName)"

Write-Output "Running pipeline: $PipelineName"

$RunID = Invoke-AzureRmDataFactoryV2Pipeline -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -PipelineName $PipelineName

$RunInfo = Get-AzureRmDataFactoryV2PipelineRun -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -PipelineRunId $RunID

Write-Output "`nPipeline triggered!"

Write-Output "RunID: $($RunInfo.RunId)"

Write-Output "Started: $($RunInfo.RunStart)`n"

$sw = [system.diagnostics.stopwatch]::StartNew()

While (($Pipeline = Get-AzureRmDataFactoryV2PipelineRun -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -PipelineRunId $RunID |Select -ExpandProperty "Status") -eq "InProgress")

{

#Write-Output

$RunInfo = Get-AzureRmDataFactoryV2PipelineRun -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -PipelineRunId $RunID

Write-Output "`rLast status: $($RunInfo.Status) | Last updated: $($RunInfo.LastUpdated) | Running time: $($sw.Elapsed.ToString('dd\.hh\:mm\:ss'))" #-NoNewline

Start-Sleep $CheckLoopTime

}

$sw.Stop()

$RunInfo = Get-AzureRmDataFactoryV2PipelineRun -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -PipelineRunId $RunID

Write-Output "`nFinished running in $($sw.Elapsed.ToString('dd\.hh\:mm\:ss'))!"

Write-Output "Status:"

Write-Output $RunInfo.Status

if ($RunInfo.Status -ne "Succeeded"){

throw "There was an error with running pipeline: $($RunInfo.PipelineName). Returned message was:`n$($RunInfo.Message)"

}

Catch{

Throw

}

Test run

I will run my testing pipeline which simply starts wait activity (5 sec.) then will try to run unexisting procedure (after which pipeline should fail)

Go to saved runbook, click “Start“.

Provide parameters, like this one and click OK:

Runbook will be queued. Go to Output and wait for results.

3 thoughts on “Azure Data Factory V2 and Azure Automation – Running pipeline from runbook with PowerShell”

GREG COOPMAN says:

4 June 2021 at 21:38

What are the benefits to running the pipelines in runbooks vs in ADF w/triggers? I can see that maybe more control of the jobs running especially when there are dependencies on other pipelines to finish.

1. GREG COOPMAN says:
  
  4 June 2021 at 22:07
  
  … And are there any significant downsides to running an ADF pipeline under a RunBook? Thanks
  
  1. Jordi Perreman says:
    
    7 April 2023 at 15:42
    
    You could setup a webhook which can execute the runbook.
    The webhook can be triggered via https url which can be triggered from anywhere.

Azure Data & Infra

Subjectively and uncensored...

Azure Data Factory V2 and Azure Automation – Running pipeline from runbook with PowerShell

Introduction

Know your Automation

Preparations

Adding credential

Adding variables

Adding AzureRM.DataFactoryV2 module

Creating the runbook

PowerShell script

Parameters

The code

Test run

3 thoughts on “Azure Data Factory V2 and Azure Automation – Running pipeline from runbook with PowerShell”

Leave a Reply Cancel reply