快速入門：使用 Azure CLI 在 Azure HDInsight 中建立 Apache Spark 叢集

在本快速入門中，您將瞭解如何使用 Azure CLI 在 Azure HDInsight 中建立 Apache Spark 叢集。 Azure HDInsight 是企業受控、全方位、開放原始碼的分析服務。適用於 HDInsight 的 Apache Spark 架構能夠運用記憶體內部處理，使得資料分析及叢集運算更為快速。 Azure CLI 是Microsoft管理 Azure 資源的跨平臺命令行體驗。

如果您同時使用多個叢集，您可建立虛擬網路，而如果您使用的是 Spark 叢集，則可使用 Hive Warehouse Connector。如需詳細資訊，請參閱針對 Azure HDInsight 規劃虛擬網路和整合 Apache Spark 和 Apache Hive 與 Hive Warehouse Connector。

如果您沒有 Azure 帳戶，請在開始之前建立免費帳戶。

先決條件

在 Azure Cloud Shell 中使用 Bash 環境。如需詳細資訊，請參閱開始使用 Azure Cloud Shell。
若要在本地執行 CLI 參考命令，請安裝 Azure CLI。如果您正在 Windows 或 macOS 上執行，請考慮在 Docker 容器中執行 Azure CLI。如需詳細資訊，請參閱〈如何在 Docker 容器中執行 Azure CLI〉。
- 如果您使用的是本機安裝，請使用 az login 命令，透過 Azure CLI 來登入。若要完成驗證程式，請遵循終端機中顯示的步驟。如需其他登入選項，請參閱使用 Azure CLI 向 Azure 進行驗證。
- 出現提示時，請在第一次使用時安裝 Azure CLI 延伸模組。如需擴充功能的詳細資訊，請參閱使用和管理 Azure CLI 的擴充功能。
- 執行 az version 以尋找已安裝的版本和相依程式庫。若要升級至最新版本，請執行 az upgrade。

建立 Apache Spark 叢集

登入 Azure 訂用帳戶。如果您打算使用 Azure Cloud Shell，請選取下列程式代碼區塊右上角的 [試用 ]。否則，輸入下列命令：
```
az login

# If you have multiple subscriptions, set the one to use
# az account set --subscription "SUBSCRIPTIONID"
```

設定環境變數。本快速入門中的變數使用是以Bash為基礎。針對其他環境，會需要一點變化。將以下程式碼片段中的 RESOURCEGROUPNAME、LOCATION、CLUSTERNAME、STORAGEACCOUNTNAME 和 PASSWORD 取代為所需的值。然後輸入 CLI 命令來設定環境變數。

export resourceGroupName=RESOURCEGROUPNAME
export location=LOCATION
export clusterName=CLUSTERNAME
export AZURE_STORAGE_ACCOUNT=STORAGEACCOUNTNAME
export httpCredential='PASSWORD'
export sshCredentials='PASSWORD'

export AZURE_STORAGE_CONTAINER=$clusterName
export clusterSizeInNodes=1
export clusterVersion=4.0
export clusterType=spark
export componentVersion=Spark=2.3

輸入下列命令來建立資源群組：

az group create \
    --location $location \
    --name $resourceGroupName

輸入下列命令來建立 Azure 記憶體帳戶：

az storage account create \
    --name $AZURE_STORAGE_ACCOUNT \
    --resource-group $resourceGroupName \
    --https-only true \
    --kind StorageV2 \
    --location $location \
    --sku Standard_LRS

輸入下列命令，從 Azure 記憶體帳戶擷取主鍵，並將其儲存在變數中：

export AZURE_STORAGE_KEY=$(az storage account keys list \
    --account-name $AZURE_STORAGE_ACCOUNT \
    --resource-group $resourceGroupName \
    --query [0].value -o tsv)

輸入下列命令來建立 Azure 記憶體容器：

az storage container create \
    --name $AZURE_STORAGE_CONTAINER \
    --account-key $AZURE_STORAGE_KEY \
    --account-name $AZURE_STORAGE_ACCOUNT

輸入下列命令來建立 Apache Spark 叢集：

az hdinsight create \
    --name $clusterName \
    --resource-group $resourceGroupName \
    --type $clusterType \
    --component-version $componentVersion \
    --http-password $httpCredential \
    --http-user admin \
    --location $location \
    --workernode-count $clusterSizeInNodes \
    --ssh-password $sshCredentials \
    --ssh-user sshuser \
    --storage-account $AZURE_STORAGE_ACCOUNT \
    --storage-account-key $AZURE_STORAGE_KEY \
    --storage-container $AZURE_STORAGE_CONTAINER \
    --version $clusterVersion

清理資源

完成此快速入門之後，您可以刪除叢集。利用 HDInsight，您的資料會儲存在 Azure 儲存體中，以便您在未使用叢集時安全地刪除該叢集。您也需支付 HDInsight 叢集的費用 (即使未使用該叢集)。由於叢集費用是儲存體費用的許多倍，所以刪除未使用的叢集符合經濟效益。

輸入所有或部分的下列命令來移除資源：

# Remove cluster
az hdinsight delete \
    --name $clusterName \
    --resource-group $resourceGroupName

# Remove storage container
az storage container delete \
    --account-name $AZURE_STORAGE_ACCOUNT \
    --name $AZURE_STORAGE_CONTAINER

# Remove storage account
az storage account delete \
    --name $AZURE_STORAGE_ACCOUNT \
    --resource-group $resourceGroupName

# Remove resource group
az group delete \
    --name $resourceGroupName

後續步驟

在本快速入門中，您已瞭解如何使用 Azure CLI 在 Azure HDInsight 中建立 Apache Spark 叢集。前往下一個教學課程，以了解如何使用 HDInsight 叢集來執行範例資料的互動式查詢。

在 Apache Spark 上執行互動式查詢

意見反應

此頁面對您有幫助嗎？

Last updated on 2025-05-20