Manage Azure Data Lake Analytics using Azure PowerShell
Važno
Azure Data Lake Analytics retired on 29 February 2024. Learn more with this announcement.
For data analytics, your organization can use Azure Synapse Analytics or Microsoft Fabric.
This article describes how to manage Azure Data Lake Analytics accounts, data sources, users, and jobs by using Azure PowerShell.
Napomena
We recommend that you use the Azure Az PowerShell module to interact with Azure. See Install Azure PowerShell to get started. To learn how to migrate to the Az PowerShell module, see Migrate Azure PowerShell from AzureRM to Az.
To use PowerShell with Data Lake Analytics, collect the following pieces of information:
- Subscription ID: The ID of the Azure subscription that contains your Data Lake Analytics account.
- Resource group: The name of the Azure resource group that contains your Data Lake Analytics account.
- Data Lake Analytics account name: The name of your Data Lake Analytics account.
- Default Data Lake Store account name: Each Data Lake Analytics account has a default Data Lake Store account.
- Location: The location of your Data Lake Analytics account, such as "East US 2" or other supported locations.
The PowerShell snippets in this tutorial use these variables to store this information
$subId = "<SubscriptionId>"
$rg = "<ResourceGroupName>"
$adla = "<DataLakeAnalyticsAccountName>"
$adls = "<DataLakeStoreAccountName>"
$location = "<Location>"
Log in using a subscription ID or by subscription name
# Using subscription id
Connect-AzAccount -SubscriptionId $subId
# Using subscription name
Connect-AzAccount -SubscriptionName $subname
The Connect-AzAccount
cmdlet always prompts for credentials. You can avoid being prompted by using the following cmdlets:
# Save login session information
Save-AzAccounts -Path D:\profile.json
# Load login session information
Select-AzAccounts -Path D:\profile.json
$tenantid = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
$spi_appname = "appname"
$spi_appid = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
$spi_secret = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
$pscredential = New-Object System.Management.Automation.PSCredential ($spi_appid, (ConvertTo-SecureString $spi_secret -AsPlainText -Force))
Login-AzAccount -ServicePrincipal -TenantId $tenantid -Credential $pscredential -Subscription $subid
# List Data Lake Analytics accounts within the current subscription.
Get-AdlAnalyticsAccount
# List Data Lake Analytics accounts within a specific resource group.
Get-AdlAnalyticsAccount -ResourceGroupName $rg
Every Data Lake Analytics account requires a default Data Lake Store account that it uses for storing logs. You can reuse an existing account or create an account.
# Create a data lake store if needed, or you can re-use an existing one
New-AdlStore -ResourceGroupName $rg -Name $adls -Location $location
New-AdlAnalyticsAccount -ResourceGroupName $rg -Name $adla -Location $location -DefaultDataLake $adls
Get details about an account.
Get-AdlAnalyticsAccount -Name $adla
Test-AdlAnalyticsAccount -Name $adla
Azure Data Lake Analytics currently supports the following data sources:
Every Data Lake Analytics account has a default Data Lake Store account. The default Data Lake Store account is used to store job metadata and job audit logs.
$adla_acct = Get-AdlAnalyticsAccount -Name $adla
$dataLakeStoreName = $adla_acct.DefaultDataLakeAccount
You can find the default Data Lake Store account by filtering the list of datasources by the IsDefault
property:
Get-AdlAnalyticsDataSource -Account $adla | ? { $_.IsDefault }
# Add an additional Storage (Blob) account.
$AzureStorageAccountName = "<AzureStorageAccountName>"
$AzureStorageAccountKey = "<AzureStorageAccountKey>"
Add-AdlAnalyticsDataSource -Account $adla -Blob $AzureStorageAccountName -AccessKey $AzureStorageAccountKey
# Add an additional Data Lake Store account.
$AzureDataLakeStoreName = "<AzureDataLakeStoreAccountName"
Add-AdlAnalyticsDataSource -Account $adla -DataLakeStore $AzureDataLakeStoreName
# List all the data sources
Get-AdlAnalyticsDataSource -Account $adla
# List attached Data Lake Store accounts
Get-AdlAnalyticsDataSource -Account $adla | where -Property Type -EQ "DataLakeStore"
# List attached Storage accounts
Get-AdlAnalyticsDataSource -Account $adla | where -Property Type -EQ "Blob"
$script = @"
@a =
SELECT * FROM
(VALUES
("Contoso", 1500.0),
("Woodgrove", 2700.0)
) AS D( customer, amount );
OUTPUT @a
TO "/data.csv"
USING Outputters.Csv();
"@
$scriptpath = "d:\test.usql"
$script | Out-File $scriptpath
Submit-AdlJob -AccountName $adla -Script $script -Name "Demo"
$scriptpath = "d:\test.usql"
$script | Out-File $scriptpath
Submit-AdlJob -AccountName $adla –ScriptPath $scriptpath -Name "Demo"
The output includes the currently running jobs and those jobs that have recently completed.
Get-AdlJob -Account $adla
By default the list of jobs is sorted on submit time. So the most recently submitted jobs appear first. By default, The ADLA account remembers jobs for 180 days, but the Get-AdlJob cmdlet by default returns only the first 500. Use -Top parameter to list a specific number of jobs.
$jobs = Get-AdlJob -Account $adla -Top 10
Using the -State
parameter. You can combine any of these values:
Accepted
Compiling
Ended
New
Paused
Queued
Running
Scheduling
Start
# List the running jobs
Get-AdlJob -Account $adla -State Running
# List the jobs that have completed
Get-AdlJob -Account $adla -State Ended
# List the jobs that have not started yet
Get-AdlJob -Account $adla -State Accepted,Compiling,New,Paused,Scheduling,Start
Use the -Result
parameter to detect whether ended jobs completed successfully. It has these values:
- Canceled
- Failed
- None
- Succeeded
# List Successful jobs.
Get-AdlJob -Account $adla -State Ended -Result Succeeded
# List Failed jobs.
Get-AdlJob -Account $adla -State Ended -Result Failed
The -Submitter
parameter helps you identify who submitted a job.
Get-AdlJob -Account $adla -Submitter "joe@contoso.com"
The -SubmittedAfter
is useful in filtering to a time range.
# List jobs submitted in the last day.
$d = [DateTime]::Now.AddDays(-1)
Get-AdlJob -Account $adla -SubmittedAfter $d
# List jobs submitted in the last seven day.
$d = [DateTime]::Now.AddDays(-7)
Get-AdlJob -Account $adla -SubmittedAfter $d
Get the status of a specific job.
Get-AdlJob -AccountName $adla -JobId $job.JobId
Stop-AdlJob -Account $adla -JobID $jobID
Instead of repeating Get-AdlAnalyticsJob
until a job finishes, you can use the Wait-AdlJob
cmdlet to wait for the job to end.
Wait-AdlJob -Account $adla -JobId $job.JobId
Use the Get-AdlJobPipeline
cmdlet to see the pipeline information previously submitted jobs.
$pipelines = Get-AdlJobPipeline -Account $adla
$pipeline = Get-AdlJobPipeline -Account $adla -PipelineId "<pipeline ID>"
Use the Get-AdlJobRecurrence
cmdlet to see the recurrence information for previously submitted jobs.
$recurrences = Get-AdlJobRecurrence -Account $adla
$recurrence = Get-AdlJobRecurrence -Account $adla -RecurrenceId "<recurrence ID>"
The Get-AdlAnalyticsComputePolicy
cmdlet retrieves info about compute policies for a Data Lake Analytics account.
$policies = Get-AdlAnalyticsComputePolicy -Account $adla
The New-AdlAnalyticsComputePolicy
cmdlet creates a new compute policy for a Data Lake Analytics account. This example sets the maximum AUs available to the specified user to 50, and the minimum job priority to 250.
$userObjectId = (Get-AzAdUser -SearchString "garymcdaniel@contoso.com").Id
New-AdlAnalyticsComputePolicy -Account $adla -Name "GaryMcDaniel" -ObjectId $objectId -ObjectType User -MaxDegreeOfParallelismPerJob 50 -MinPriorityPerJob 250
Test-AdlStoreItem -Account $adls -Path "/data.csv"
Upload a file.
Import-AdlStoreItem -AccountName $adls -Path "c:\data.tsv" -Destination "/data_copy.csv"
Upload an entire folder recursively.
Import-AdlStoreItem -AccountName $adls -Path "c:\myData\" -Destination "/myData/" -Recurse
Download a file.
Export-AdlStoreItem -AccountName $adls -Path "/data.csv" -Destination "c:\data.csv"
Download an entire folder recursively.
Export-AdlStoreItem -AccountName $adls -Path "/" -Destination "c:\myData\" -Recurse
Napomena
If the upload or download process is interrupted, you can attempt to resume the process by running the cmdlet again with the -Resume
flag.
The U-SQL catalog is used to structure data and code so they can be shared by U-SQL scripts. The catalog enables the highest performance possible with data in Azure Data Lake. For more information, see Use U-SQL catalog.
# List U-SQL databases
Get-AdlCatalogItem -Account $adla -ItemType Database
# List tables within a database
Get-AdlCatalogItem -Account $adla -ItemType Table -Path "database"
# List tables within a schema.
Get-AdlCatalogItem -Account $adla -ItemType Table -Path "database.schema"
$dbs = Get-AdlCatalogItem -Account $adla -ItemType Database
foreach ($db in $dbs)
{
$asms = Get-AdlCatalogItem -Account $adla -ItemType Assembly -Path $db.Name
foreach ($asm in $asms)
{
$asmname = "[" + $db.Name + "].[" + $asm.Name + "]"
Write-Host $asmname
}
}
# Get details of a table
Get-AdlCatalogItem -Account $adla -ItemType Table -Path "master.dbo.mytable"
# Test existence of a U-SQL database.
Test-AdlCatalogItem -Account $adla -ItemType Database -Path "master"
Within a U-SQL database, create a credential object for a database hosted in Azure. Currently, U-SQL credentials are the only type of catalog item that you can create through PowerShell.
$dbName = "master"
$credentialName = "ContosoDbCreds"
$dbUri = "https://contoso.database.windows.net:8080"
New-AdlCatalogCredential -AccountName $adla `
-DatabaseName $db `
-CredentialName $credentialName `
-Credential (Get-Credential) `
-Uri $dbUri
Get-AdlAnalyticsFirewallRule -Account $adla
$ruleName = "Allow access from on-prem server"
$startIpAddress = "<start IP address>"
$endIpAddress = "<end IP address>"
Add-AdlAnalyticsFirewallRule -Account $adla -Name $ruleName -StartIpAddress $startIpAddress -EndIpAddress $endIpAddress
Set-AdlAnalyticsFirewallRule -Account $adla -Name $ruleName -StartIpAddress $startIpAddress -EndIpAddress $endIpAddress
Remove-AdlAnalyticsFirewallRule -Account $adla -Name $ruleName
Set-AdlAnalyticsAccount -Name $adla -AllowAzureIpState Enabled
Set-AdlAnalyticsAccount -Name $adla -FirewallState Enabled
Set-AdlAnalyticsAccount -Name $adla -FirewallState Disabled
Resolve-AzError -Last
function Test-Administrator
{
$user = [Security.Principal.WindowsIdentity]::GetCurrent();
$p = New-Object Security.Principal.WindowsPrincipal $user
$p.IsInRole([Security.Principal.WindowsBuiltinRole]::Administrator)
}
From a subscription name:
function Get-TenantIdFromSubscriptionName( [string] $subname )
{
$sub = (Get-AzSubscription -SubscriptionName $subname)
$sub.TenantId
}
Get-TenantIdFromSubscriptionName "ADLTrainingMS"
From a subscription ID:
function Get-TenantIdFromSubscriptionId( [string] $subid )
{
$sub = (Get-AzSubscription -SubscriptionId $subid)
$sub.TenantId
}
$subid = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Get-TenantIdFromSubscriptionId $subid
From a domain address such as "contoso.com"
function Get-TenantIdFromDomain( $domain )
{
$url = "https://login.windows.net/" + $domain + "/.well-known/openid-configuration"
return (Invoke-WebRequest $url|ConvertFrom-Json).token_endpoint.Split('/')[3]
}
$domain = "contoso.com"
Get-TenantIdFromDomain $domain
$subs = Get-AzSubscription
foreach ($sub in $subs)
{
Write-Host $sub.Name "(" $sub.Id ")"
Write-Host "`tTenant Id" $sub.TenantId
}
- Overview of Microsoft Azure Data Lake Analytics
- Get started with Data Lake Analytics using the Azure portal | Azure PowerShell | Azure CLI
- Manage Azure Data Lake Analytics using Azure portal | Azure PowerShell | CLI