Bagikan melalui


Create Apache Hadoop clusters using the Azure REST API

Pelajari cara membuat kluster HDInsight menggunakan templat Azure Resource Manager dan Azure REST API.

Azure REST API memungkinkan Anda melakukan operasi manajemen pada layanan yang dihosting di platform Azure, termasuk pembuatan sumber daya baru seperti kluster HDInsight.

Nota

The steps in this document use the curl (https://curl.haxx.se/) utility to communicate with the Azure REST API.

Membuat templat

Azure Resource Manager templates are JSON documents that describe a resource group and all resources in it (such as HDInsight.) This template-based approach allows you to define the resources that you need for HDInsight in one template.

The following JSON document is a merger of the template and parameters files from https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.hdinsight/hdinsight-linux-ssh-password/azuredeploy.json, which creates a Linux-based cluster using a password to secure the SSH user account.

{
    "properties": {
        "template": {
            "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
            "contentVersion": "1.0.0.0",
            "parameters": {
                "clusterType": {
                    "type": "string",
                    "allowedValues": ["hadoop",
                    "hbase",
                    "spark"],
                    "metadata": {
                        "description": "The type of the HDInsight cluster to create."
                    }
                },
                "clusterName": {
                    "type": "string",
                    "metadata": {
                        "description": "The name of the HDInsight cluster to create."
                    }
                },
                "clusterLoginUserName": {
                    "type": "string",
                    "metadata": {
                        "description": "These credentials can be used to submit jobs to the cluster and to log into cluster dashboards."
                    }
                },
                "clusterLoginPassword": {
                    "type": "securestring",
                    "metadata": {
                        "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
                    }
                },
                "sshUserName": {
                    "type": "string",
                    "metadata": {
                        "description": "These credentials can be used to remotely access the cluster."
                    }
                },
                "sshPassword": {
                    "type": "securestring",
                    "metadata": {
                        "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
                    }
                },
                "clusterStorageAccountName": {
                    "type": "string",
                    "metadata": {
                        "description": "The name of the storage account to be created and be used as the cluster's storage."
                    }
                },
                "clusterWorkerNodeCount": {
                    "type": "int",
                    "defaultValue": 4,
                    "metadata": {
                        "description": "The number of nodes in the HDInsight cluster."
                    }
                }
            },
            "variables": {
                "defaultApiVersion": "2015-05-01-preview",
                "clusterApiVersion": "2015-03-01-preview"
            },
            "resources": [{
                "name": "[parameters('clusterStorageAccountName')]",
                "type": "Microsoft.Storage/storageAccounts",
                "location": "[resourceGroup().location]",
                "apiVersion": "[variables('defaultApiVersion')]",
                "dependsOn": [],
                "tags": {

                },
                "properties": {
                    "accountType": "Standard_LRS"
                }
            },
            {
                "name": "[parameters('clusterName')]",
                "type": "Microsoft.HDInsight/clusters",
                "location": "[resourceGroup().location]",
                "apiVersion": "[variables('clusterApiVersion')]",
                "dependsOn": ["[concat('Microsoft.Storage/storageAccounts/',parameters('clusterStorageAccountName'))]"],
                "tags": {

                },
                "properties": {
                    "clusterVersion": "3.6",
                    "osType": "Linux",
                    "clusterDefinition": {
                        "kind": "[parameters('clusterType')]",
                        "configurations": {
                            "gateway": {
                                "restAuthCredential.isEnabled": true,
                                "restAuthCredential.username": "[parameters('clusterLoginUserName')]",
                                "restAuthCredential.password": "[parameters('clusterLoginPassword')]"
                            }
                        }
                    },
                    "storageProfile": {
                        "storageaccounts": [{
                            "name": "[concat(parameters('clusterStorageAccountName'),'.blob.core.windows.net')]",
                            "isDefault": true,
                            "container": "[parameters('clusterName')]",
                            "key": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', parameters('clusterStorageAccountName')), variables('defaultApiVersion')).key1]"
                        }]
                    },
                    "computeProfile": {
                        "roles": [{
                            "name": "headnode",
                            "targetInstanceCount": "2",
                            "hardwareProfile": {
                                "vmSize": "{}" 
                            },
                            "osProfile": {
                                "linuxOperatingSystemProfile": {
                                    "username": "[parameters('sshUserName')]",
                                    "password": "[parameters('sshPassword')]"
                                }
                            }
                        },
                        {
                            "name": "workernode",
                            "targetInstanceCount": "[parameters('clusterWorkerNodeCount')]",
                            "hardwareProfile": {
                                "vmSize": "{}"
                            },
                            "osProfile": {
                                "linuxOperatingSystemProfile": {
                                    "username": "[parameters('sshUserName')]",
                                    "password": "[parameters('sshPassword')]"
                                }
                            }
                        }]
                    }
                }
            }],
            "outputs": {
                "cluster": {
                    "type": "object",
                    "value": "[reference(resourceId('Microsoft.HDInsight/clusters',parameters('clusterName')))]"
                }
            }
        },
        "mode": "incremental",
        "Parameters": {
            "clusterName": {
                "value": "newclustername"
            },
            "clusterType": {
                "value": "hadoop"
            },
            "clusterStorageAccountName": {
                "value": "newstoragename"
            },
            "clusterLoginUserName": {
                "value": "admin"
            },
            "clusterLoginPassword": {
                "value": "changeme"
            },
            "sshUserName": {
                "value": "sshuser"
            },
            "sshPassword": {
                "value": "changeme"
            }
        }
    }
}

This example is used in the steps in this document. Replace the example values in the Parameters section with the values for your cluster.

Penting

The template uses the default number of worker nodes (4) for an HDInsight cluster. If you plan on more than 32 worker nodes, then you must select a head node size with at least 8 cores and 14-GB ram.

Untuk informasi selengkapnya tentang ukuran node dan biaya terkait, lihat Harga HDInsight.

Masuk ke langganan Azure Anda

Follow the steps documented in Get started with Azure CLI and connect to your subscription using the az login command.

Membuat "Service Principal"

Nota

These steps are an abridged version of the Create service principal with password section of the Use Azure CLI to create a service principal to access resources document. These steps create a service principal that is used to authenticate to the Azure REST API.

  1. From a command line, use the following command to list your Azure subscriptions.

    az account list --query '[].{Subscription_ID:id,Tenant_ID:tenantId,Name:name}'  --output table
    

    In the list, select the subscription that you want to use and note the Subscription_ID and Tenant_ID columns. Simpan nilai-nilai ini.

  2. Use the following command to create an application in Microsoft Entra ID.

    az ad app create --display-name "exampleapp" --homepage "https://www.contoso.org" --identifier-uris "https://www.contoso.org/example" --password <Your password> --query 'appId'
    

    Replace the values for the --display-name, --homepage, and --identifier-uris with your own values. Provide a password for the new Active Directory entry.

    Nota

    The --home-page and --identifier-uris values don't need to reference an actual web page hosted on the internet. They must be unique URIs.

    The value returned from this command is the App ID for the new application. Save this value.

  3. Use the following command to create a service principal using the App ID.

    az ad sp create --id <App ID> --query 'objectId'
    

    The value returned from this command is the Object ID. Save this value.

  4. Assign the Owner role to the service principal using the Object ID value. Use the subscription ID you obtained earlier.

    az role assignment create --assignee <Object ID> --role Owner --scope /subscriptions/<Subscription ID>/
    

Get an authentication token

Use the following command to retrieve an authentication token:

curl -X "POST" "https://login.microsoftonline.com/$TENANTID/oauth2/token" \
-H "Cookie: flight-uxoptin=true; stsservicecookie=ests; x-ms-gateway-slice=productionb; stsservicecookie=ests" \
-H "Content-Type: application/x-www-form-urlencoded" \
--data-urlencode "client_id=$APPID" \
--data-urlencode "grant_type=client_credentials" \
--data-urlencode "client_secret=$PASSWORD" \
--data-urlencode "resource=https://management.azure.com/"

Set $TENANTID, $APPID, and $PASSWORD to the values obtained or used previously.

If this request is successful, you receive a 200 series response and the response body contains a JSON document.

The JSON document returned by this request contains an element named access_token. The value of access_token is used to authentication requests to the REST API.

{
    "token_type":"Bearer",
    "expires_in":"3599",
    "expires_on":"1463409994",
    "not_before":"1463406094",
    "resource":"https://management.azure.com/","access_token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ik1uQ19WWoNBVGZNNXBPWWlKSE1iYTlnb0VLWSIsImtpZCI6Ik1uQ19WWmNBVGZNNXBPWWlKSE1iYTlnb0VLWSJ9.eyJhdWQiOiJodHRwczovL21hbmFnZW1lbnQuYXp1cmUuY29tLyIsImlzcyI6Imh0dHBzOi8vc3RzLndpbmRvd3MubmV0LzcyZjk4OGJmLTg2ZjEtNDFhZi05MWFiLTJkN2NkMDExZGI2Ny8iLCJpYXQiOjE0NjM0MDYwOTQsIm5iZiI6MTQ2MzQwNjA5NCwiZXhwIjoxNDYzNDA5OTk5LCJhcHBpZCI6IjBlYzcyMzM0LTZkMDMtNDhmYi04OWU1LTU2NTJiODBiZDliYiIsImFwcGlkYWNyIjoiMSIsImlkcCI6Imh0dHBzOi8vc3RzLndpbmRvd3MubmV0LzcyZjk4OGJmLTg2ZjEtNDFhZi05MWFiLTJkN2NkMDExZGI0Ny8iLCJvaWQiOiJlNjgxZTZiMi1mZThkLTRkZGUtYjZiMS0xNjAyZDQyNWQzOWYiLCJzdWIiOiJlNjgxZTZiMi1mZThkLTRkZGUtYjZiMS0xNjAyZDQyNWQzOWYiLCJ0aWQiOiI3MmY5ODhiZi04NmYxLTQxYWYtOTFhYi0yZDdjZDAxMWRiNDciLCJ2ZXIiOiIxLjAifQ.nJVERbeDHLGHn7ZsbVGBJyHOu2PYhG5dji6F63gu8XN2Cvol3J1HO1uB4H3nCSt9DTu_jMHqAur_NNyobgNM21GojbEZAvd0I9NY0UDumBEvDZfMKneqp7a_cgAU7IYRcTPneSxbD6wo-8gIgfN9KDql98b0uEzixIVIWra2Q1bUUYETYqyaJNdS4RUmlJKNNpENllAyHQLv7hXnap1IuzP-f5CNIbbj9UgXxLiOtW5JhUAwWLZ3-WMhNRpUO2SIB7W7tQ0AbjXw3aUYr7el066J51z5tC1AK9UC-mD_fO_HUP6ZmPzu5gLA6DxkIIYP3grPnRVoUDltHQvwgONDOw"
}

Membuat grup sumber daya

Use the following to create a resource group.

  • Set $SUBSCRIPTIONID to the subscription ID received while creating the service principal.
  • Set $ACCESSTOKEN to the access token received in the previous step.
  • Replace DATACENTERLOCATION with the data center you wish to create the resource group, and resources, in. For example, 'South Central US'.
  • Set $RESOURCEGROUPNAME to the name you wish to use for this group:
curl -X "PUT" "https://management.azure.com/subscriptions/$SUBSCRIPTIONID/resourcegroups/$RESOURCEGROUPNAME?api-version=2015-01-01" \
    -H "Authorization: Bearer $ACCESSTOKEN" \
    -H "Content-Type: application/json" \
    -d $'{
"location": "DATACENTERLOCATION"
}'

If this request is successful, you receive a 200 series response and the response body contains a JSON document containing information about the group. The "provisioningState" element contains a value of "Succeeded".

Create a deployment

Use the following command to deploy the template to the resource group.

  • Set $DEPLOYMENTNAME to the name you wish to use for this deployment.
curl -X "PUT" "https://management.azure.com/subscriptions/$SUBSCRIPTIONID/resourcegroups/$RESOURCEGROUPNAME/providers/microsoft.resources/deployments/$DEPLOYMENTNAME?api-version=2015-01-01" \
-H "Authorization: Bearer $ACCESSTOKEN" \
-H "Content-Type: application/json" \
-d "{set your body string to the template and parameters}"

Nota

If you saved the template to a file, you can use the following command instead of -d "{ template and parameters}":

--data-binary "@/path/to/file.json"

If this request is successful, you receive a 200 series response and the response body contains a JSON document containing information about the deployment operation.

Penting

The deployment has been submitted, but has not completed. It can take several minutes, usually around 15, for the deployment to complete.

Check the status of a deployment

To check the status of the deployment, use the following command:

curl -X "GET" "https://management.azure.com/subscriptions/$SUBSCRIPTIONID/resourcegroups/$RESOURCEGROUPNAME/providers/microsoft.resources/deployments/$DEPLOYMENTNAME?api-version=2015-01-01" \
-H "Authorization: Bearer $ACCESSTOKEN" \
-H "Content-Type: application/json"

This command returns a JSON document containing information about the deployment operation. The "provisioningState" element contains the status of the deployment. If this element contains a value of "Succeeded", then the deployment has completed successfully.

Pemecahan Masalah

Jika Anda mengalami masalah dengan pembuatan kluster HDInsight, lihat persyaratan kontrol akses.

Langkah berikutnya

Now that you've successfully created an HDInsight cluster, use the following to learn how to work with your cluster.

Kluster Apache Hadoop

Kluster Apache HBase