Azure Data Lake Store PowerShell Toolkit
Introduction
Working with the Azure Data Lake Store can sometimes be difficult, especially when performing actions on several items. As there is currently no GUI tool for handling this, PowerShell can be used to perform various tasks. The toolkit described in this article contains several scripts, which makes automation in the Data Lake a little easier.
How To Use
- Download AzureDataLakeStoreTookit.zip from TechNet Gallery
- Unzip to a local folder
- Run scripts in Admin PowerShell console. Make sure PowerShell is not restricted
Details
This section will list all the script in the toolkit, and explain their purpose. All scripts are deliberately designed to do one thing only, to avoid complex error handling. All scripts include comment section with synopsis, description, example(s) and notes. This has been removed from the code sections below. All scripts include variables which must be changed to adhere to the applicable environment. The scripts are listed alphabetically.
CountFiles.ps1
Count all files in a specific folder
function CountFiles
{
Param(
[string]$rootFolder
)
$items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $rootFolder
$count = 0
Write-Host "Number of files in $rootFolder :"
foreach ($item in $items)
{
if ($item.Type -eq "FILE")
{
$count += 1
}
}
return $count
}
Login-AzureRmAccount
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$folder = "/user" #Replace value with the folder you want to delete files from
CountFiles $folder
DeleteTMPFiles.ps1
Delete all files with a certain extension (configurable). Default value is TMP
Login-AzureRmAccount
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$rootFolder = "/user/myfolder" #Replace value with the folder you want to delete files from
$count = 0
$items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $rootFolder | Where-Object {$_.Name.Contains(".TMP")} #Replace value with the extension you want to delete
foreach ($item in $items)
{
$fileName = $item.Name
Write-Host "Deleting $fileName"
Remove-AzureRmDataLakeStoreItem -Account $dataLakeStoreName -Path $rootFolder/$fileName -Force
$count += 1
}
Write-Host "`n$count file(s) were deleted"
DownloadFiles.ps1
Download all files in a folder to a folder on the local disk
Login-AzureRmAccount
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$rootFolder = "/user/myfolder" #Replace value with the folder you want to download
$downloadDest = "c:\temp\" #Replace value with the download destination folder
Export-AzureRmDataLakeStoreItem -Account $dataLakeStoreName -Path $rootFolder -Destination $downloadDest -Force -Recurse
GetFolderContent.ps1
List all files in a folder (recursive)
function GetFolderContent
{
Param(
[string]$rootFolder
)
$items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $rootFolder
Write-Host "`nContents in $rootFolder"
foreach ($item in $items)
{
if ($item.Type -eq "DIRECTORY")
{
$nextFolder = $item.Name
if ($rootFolder -eq "\")
{
GetFolderContent $nextFolder
}
else
{
GetFolderContent $rootFolder/$nextFolder
}
}
if ($item.Type -eq "FILE")
{
Write-Host $item.Name
}
}
return $null
}
Login-AzureRmAccount
$dataLakeStoreName = "dataplatformdlsprod" #Replace value with your own Data Lake Store name
$rootFolder = "/raw/plant/osebergd/ims/history/Compressor" #Replace value with the folder you want to get contents of
GetFolderContent $rootFolder
MoveFiles.ps1
Move all files in a folder to another folder
Login-AzureRmAccount
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$rootFolder = "/user/from" #Replace value with the folder you want to move files from
$destFolder = "/user/to" #Replace value with the folder you want to move files to
$count = 0
$items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $rootFolder
foreach ($item in $items)
{
$fileName = $item.Name
Write-Host "Moving $fileName to"
Move-AzureRmDataLakeStoreItem -Account $dataLakeStoreName -Path $rootFolder/$fileName -Destination $destFolder/$fileName -Force
$count += 1
}
Write-Host "`n$count file(s) were moved"
RemoveFileExpiry.ps1
Remove file expiry on a file. The file will no longer be deleted after the expiration date is reached
function RemoveFileExpiry
{
Param(
[string]$fileName
)
Write-Host "Removing expiry on $fileName"
Set-AzureRmDataLakeStoreItemExpiry -Account $dataLakeStoreName -Path $fileName
}
Login-AzureRmAccount
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$fileName = "/user/myfile.jpg" #Replace value with the file you want to remove expiry on
RemoveFileExpiry $fileName
RemoveFolderExpiry.ps1
Remove file expiry on all files in a folder. All files in the folder will no longer be deleted after the expiration date is reached
function RemoveFolderExpiry
{
Param(
[string]$folderName
)
$now = Get-Date
$items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $folderName
foreach ($item in $items)
{
if ($item.Type -eq "DIRECTORY")
{
$nextFolder = $item.Name
if ($folderName -eq "\")
{
RemoveFolderExpiry $nextFolder
}
else
{
RemoveFolderExpiry $folderName/$nextFolder
}
}
if ($item.Type -eq "FILE")
{
$fileName = $item.Name
Write-Host "Removing expiry on $folderName/$fileName"
Set-AzureRmDataLakeStoreItemExpiry -Account $dataLakeStoreName -Path $folderName/$fileName
$global:count += 1
}
}
}
Login-AzureRmAccount
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$folderName = "/user/myfolder" #Replace value with the folder you want to remove expiry on
$global:count = 0
RemoveFolderExpiry $folderName
Write-Host "`nRemoved expiry on $count file(s)"
SearchForFile.ps1
Search for a file
function SearchForFile
{
Param(
[string]$rootFolder
)
$items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $rootFolder
foreach ($item in $items)
{
if ($item.Type -eq "DIRECTORY")
{
$nextFolder = $item.Name
if ($rootFolder -eq "\")
{
SearchForFile $nextFolder
}
else
{
SearchForFile $rootFolder/$nextFolder
}
}
if ($item.Type -eq "FILE")
{
if ($item.Name -like $searchString)
{
Write-Host $item.Name "found in" $rootFolder
}
}
}
return $null
}
Login-AzureRmAccount
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$rootFolder = "/user/myfolder" #Replace value with the folder you want to get contents of
$searchString = "*filename*" #Replace value with the file you want to search for. Asterisk allowed
SearchForFile $rootFolder
SetFileExpiry.ps1
Set expiry on a file. The file will be deleted after the expiration date is reached
function SetFileExpiry
{
Param(
[string]$fileName
)
$now = Get-Date
Write-Host "Setting retention on $fileName"
Set-AzureRmDataLakeStoreItemExpiry -Account $dataLakeStoreName -Path $fileName -Expiration $now.AddMonths(3) #Replace expiry as required
}
Login-AzureRmAccount
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$fileName = "/user/myfile.jpg" #Replace value with the file you want expiry on
SetFileExpiry $fileName
SetFolderExpiry.ps1
Set expiry on all files in a folder. All files in the folder will be deleted after the expiration date is reached
function SetFolderExpiry
{
Param(
[string]$folderName
)
$now = Get-Date
$items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $folderName
foreach ($item in $items)
{
if ($item.Type -eq "DIRECTORY")
{
$nextFolder = $item.Name
if ($folderName -eq "\")
{
SetFolderExpiry $nextFolder
}
else
{
SetFolderExpiry $folderName/$nextFolder
}
}
if ($item.Type -eq "FILE")
{
$fileName = $item.Name
Write-Host "Setting expiry on $folderName/$fileName"
Set-AzureRmDataLakeStoreItemExpiry -Account $dataLakeStoreName -Path $folderName/$fileName -Expiration $now.AddMonths(3)
$global:count += 1
}
}
}
Login-AzureRmAccount
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$folderName = "/user/myfolder" #Replace value with the folder you want expiry on
$global:count = 0
SetFolderExpiry $folderName
Write-Host "`nSet expiry on $count file(s)"
UploadFiles.ps1
Upload files to the Data Lake Store
Login-AzureRmAccount
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$sourceFolder = "C:\temp" #Replace value with the folder you want to upload
$uploadDest = "/user/myfolder" #Replace value with the upload destination path
Import-AzureRmDataLakeStoreItem -Account $dataLakeStoreName -Path $sourceFolder -Destination $uploadDest -Force -Recurse
See Also
- Get started with Azure Data Lake Store using Azure PowerShell
- Overview of Azure Data Lake Store
- Introduction to Cortana Intelligence Suite
Another important place to find an extensive amount of Cortana Intelligence Suite related articles is the TechNet Wiki itself. The best entry point is Cortana Intelligence Suite Resources on the TechNet Wiki.