使用 Python 來管理 Azure Data Lake Analytics

文章
12/20/2023

重要

Azure Data Lake Analytics 於 2024 年 2 月 29 日淘汰。使用此公告深入瞭解。

針對數據分析，您的組織可以使用 Azure Synapse Analytics 或 Microsoft Fabric。

本文說明如何使用 Python 來管理 Azure Data Lake Analytics 帳戶、資料來源、使用者和作業。

支援的 Python 版本

使用 64 位元版 Python。
您可以使用在 Python.org 下載 \(英文\) 找到的標準 Python 散發套件。
許多開發人員發現使用 Anaconda Python 散發套件 \(英文\) 相當便利。
本文是使用來自標準 Python 散發套件的 Python 3.6 版來撰寫的

安裝 Azure Python SDK

請安裝下列模組：

azure-mgmt-resource 模組包含適用於 Active Directory 等等的其他 Azure 模組。
azure-datalake-store 模組包含 Azure Data Lake Store 檔案系統作業。
azure-mgmt-datalake-store 模組包含 Azure Data Lake Store 帳戶管理作業。
azure-mgmt-datalake-analytics 模組包含 Azure Data Lake Analytics 作業。

請先執行下列命令，以確保您擁有最新的 pip：

python -m pip install --upgrade pip

本文件是使用 pip version 9.0.1 撰寫的。

使用下列 pip 命令以從命令列安裝新模組：

pip install azure-identity
pip install azure-mgmt-resource
pip install azure-datalake-store
pip install azure-mgmt-datalake-store
pip install azure-mgmt-datalake-analytics

建立新的 Python 指令碼

將下列程式碼貼到指令碼中：

# Use this only for Azure AD service-to-service authentication
#from azure.common.credentials import ServicePrincipalCredentials

# Use this only for Azure AD end-user authentication
#from azure.common.credentials import UserPassCredentials

# Required for Azure Identity
from azure.identity import DefaultAzureCredential

# Required for Azure Resource Manager
from azure.mgmt.resource.resources import ResourceManagementClient
from azure.mgmt.resource.resources.models import ResourceGroup

# Required for Azure Data Lake Store account management
from azure.mgmt.datalake.store import DataLakeStoreAccountManagementClient
from azure.mgmt.datalake.store.models import DataLakeStoreAccount

# Required for Azure Data Lake Store filesystem management
from azure.datalake.store import core, lib, multithread

# Required for Azure Data Lake Analytics account management
from azure.mgmt.datalake.analytics.account import DataLakeAnalyticsAccountManagementClient
from azure.mgmt.datalake.analytics.account.models import DataLakeAnalyticsAccount, DataLakeStoreAccountInformation

# Required for Azure Data Lake Analytics job management
from azure.mgmt.datalake.analytics.job import DataLakeAnalyticsJobManagementClient
from azure.mgmt.datalake.analytics.job.models import JobInformation, JobState, USqlJobProperties

# Required for Azure Data Lake Analytics catalog management
from azure.mgmt.datalake.analytics.catalog import DataLakeAnalyticsCatalogManagementClient

# Required for Azure Data Lake Analytics Model
from azure.mgmt.datalake.analytics.account.models import CreateOrUpdateComputePolicyParameters

# Use these as needed for your application
import logging
import getpass
import pprint
import uuid
import time

請執行此指令碼以確認可將模組匯入。

驗證

使用快顯視窗進行互動式使用者驗證

不支援這個方法。

使用裝置代碼進行互動式使用者驗證

user = input(
    'Enter the user to authenticate with that has permission to subscription: ')
password = getpass.getpass()
credentials = UserPassCredentials(user, password)

使用 SPI 和祕密進行非互動式驗證

# Acquire a credential object for the app identity. When running in the cloud,
# DefaultAzureCredential uses the app's managed identity (MSI) or user-assigned service principal.
# When run locally, DefaultAzureCredential relies on environment variables named
# AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, and AZURE_TENANT_ID.

credentials = DefaultAzureCredential()

使用 API 和憑證進行非互動式驗證

不支援這個方法。

通用指令碼變數

以下是範例中使用的變數。

subid = '<Azure Subscription ID>'
rg = '<Azure Resource Group Name>'
location = '<Location>'  # i.e. 'eastus2'
adls = '<Azure Data Lake Store Account Name>'
adla = '<Azure Data Lake Analytics Account Name>'

建立用戶端

resourceClient = ResourceManagementClient(credentials, subid)
adlaAcctClient = DataLakeAnalyticsAccountManagementClient(credentials, subid)
adlaJobClient = DataLakeAnalyticsJobManagementClient(
    credentials, 'azuredatalakeanalytics.net')

建立 Azure 資源群組

armGroupResult = resourceClient.resource_groups.create_or_update(
    rg, ResourceGroup(location=location))

建立 Data Lake Analytics 帳戶

首先，建立一個存放區帳戶。

adlsAcctResult = adlsAcctClient.account.begin_create(
	rg,
	adls,
	DataLakeStoreAccount(
		location=location)
	)
).wait()

接著，建立一個使用該存放區的 ADLA 帳戶。

adlaAcctResult = adlaAcctClient.account.create(
    rg,
    adla,
    DataLakeAnalyticsAccount(
        location=location,
        default_data_lake_store_account=adls,
        data_lake_store_accounts=[DataLakeStoreAccountInformation(name=adls)]
    )
).wait()

提交作業

script = """
@a  = 
    SELECT * FROM 
        (VALUES
            ("Contoso", 1500.0),
            ("Woodgrove", 2700.0)
        ) AS 
              D( customer, amount );
OUTPUT @a
    TO "/data.csv"
    USING Outputters.Csv();
"""

jobId = str(uuid.uuid4())
jobResult = adlaJobClient.job.create(
    adla,
    jobId,
    JobInformation(
        name='Sample Job',
        type='USql',
        properties=USqlJobProperties(script=script)
    )
)

等候工作結束

jobResult = adlaJobClient.job.get(adla, jobId)
while(jobResult.state != JobState.ended):
    print('Job is not yet done, waiting for 3 seconds. Current state: ' +
          jobResult.state.value)
    time.sleep(3)
    jobResult = adlaJobClient.job.get(adla, jobId)

print('Job finished with result: ' + jobResult.result.value)

列出管線和週期

視您作業是否有附加的管線或週期中繼資料而定，您可以列出管線和週期。

pipelines = adlaJobClient.pipeline.list(adla)
for p in pipelines:
    print('Pipeline: ' + p.name + ' ' + p.pipelineId)

recurrences = adlaJobClient.recurrence.list(adla)
for r in recurrences:
    print('Recurrence: ' + r.name + ' ' + r.recurrenceId)

管理計算原則

DataLakeAnalyticsAccountManagementClient 物件會提供方法，用以管理 Data Lake Analytics 帳戶的計算原則。

列出計算原則

下列程式碼會擷取 Data Lake Analytics 帳戶的計算原則清單。

policies = adlaAcctClient.compute_policies.list_by_account(rg, adla)
for p in policies:
    print('Name: ' + p.name + 'Type: ' + p.object_type + 'Max AUs / job: ' +
          p.max_degree_of_parallelism_per_job + 'Min priority / job: ' + p.min_priority_per_job)

建立新的計算原則

下列程式碼會為 Data Lake Analytics 帳戶建立新的計算原則，其中是將指定使用者可用的 AU 上限設定為 50，而將作業最低優先順序設定為 250。

userAadObjectId = "3b097601-4912-4d41-b9d2-78672fc2acde"
newPolicyParams = CreateOrUpdateComputePolicyParameters(
    userAadObjectId, "User", 50, 250)
adlaAcctClient.compute_policies.create_or_update(
    rg, adla, "GaryMcDaniel", newPolicyParams)

下一步

若要查看使用其他工具的相同教學課程，請選取頁面頂端的索引標籤選取器。
若要了解 U-SQL，請參閱開始使用 Azure Data Lake Analytics U-SQL 語言。
針對管理工作，請參閱使用 Azure 入口網站管理 Azure Data Lake Analytics。

分享方式：