Manage HDInsight clusters by using the Apache Ambari REST API
Article
Learn how to use the Apache Ambari REST API to manage and monitor Apache Hadoop clusters in Azure HDInsight.
What is Apache Ambari
Apache Ambari simplifies the management and monitoring of Hadoop clusters by providing an easy to use web UI backed by its REST APIs. Ambari is provided by default with Linux-based HDInsight clusters.
Bash on Ubuntu on Windows 10. The examples in this article use the Bash shell on Windows 10. See Windows Subsystem for Linux Installation Guide for Windows 10 for installation steps. Other Unix shells work as well. The examples, with some slight modifications, can work on a Windows Command prompt. Or you can use Windows PowerShell.
Base Uniform Resource Identifier for Ambari REST API
The base Uniform Resource Identifier (URI) for the Ambari REST API on HDInsight is https://CLUSTERNAME.azurehdinsight.net/api/v1/clusters/CLUSTERNAME, where CLUSTERNAME is the name of your cluster. Cluster names in URIs are case-sensitive. While the cluster name in the fully qualified domain name (FQDN) part of the URI (CLUSTERNAME.azurehdinsight.net) is case-insensitive, other occurrences in the URI are case-sensitive.
Authentication
Connecting to Ambari on HDInsight requires HTTPS. Use the admin account name (the default is admin) and password you provided during cluster creation.
For Enterprise Security Package clusters, instead of admin, use a fully qualified username like username@domain.onmicrosoft.com.
Examples
Setup (Preserve credentials)
Preserve your credentials to avoid reentering them for each example. The cluster name is preserved in a separate step.
A. Bash
Edit the script by replacing PASSWORD with your actual password. Then enter the command.
Bash
export password='PASSWORD'
B. PowerShell
PowerShell
$creds = Get-Credential -UserName"admin" -Message"Enter the HDInsight login"
Identify correctly cased cluster name
The actual casing of the cluster name may be different than you expect. The following steps show the actual casing, and then store it in a variable for all later examples.
Edit the scripts to replace CLUSTERNAME with your cluster name. Then enter the command. (The cluster name for the FQDN isn't case-sensitive.)
You may need to know the fully qualified domain name (FQDN) of a cluster node. You can easily retrieve the FQDN for the various nodes in the cluster using the following examples:
The IP addresses returned by the examples in this section aren't directly accessible over the internet. They're only accessible within the Azure Virtual Network that contains the HDInsight cluster.
To find the IP address, you must know the internal fully qualified domain name (FQDN) of the cluster nodes. Once you have the FQDN, you can then get the IP address of the host. The following examples first query Ambari for the FQDN of all the host nodes. Then queries Ambari for the IP address of each host.
Bash
for HOSTNAME in $(curl -u admin:$password -sS -G "https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/hosts" | jq -r '.items[].Hosts.host_name')
do
IP=$(curl -u admin:$password -sS -G "https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/hosts/$HOSTNAME" | jq -r '.Hosts.ip')
echo"$HOSTNAME <--> $IP"done
HDInsight clusters must use an Azure Storage Account or Data Lake Storage as the default storage. You can use Ambari to retrieve this information after the cluster has been created. For example, if you want to read/write data to the container outside HDInsight.
The following examples retrieve the default storage configuration from the cluster:
These examples return the first configuration applied to the server (service_config_version=1) which contains this information. If you retrieve a value that has been modified after cluster creation, you may need to list the configuration versions and retrieve the latest one.
The return value is similar to one of the following examples:
wasbs://CONTAINER@ACCOUNTNAME.blob.core.windows.net - This value indicates that the cluster is using an Azure Storage account for default storage. The ACCOUNTNAME value is the name of the storage account. The CONTAINER portion is the name of the blob container in the storage account. The container is the root of the HDFS compatible storage for the cluster.
abfs://CONTAINER@ACCOUNTNAME.dfs.core.windows.net - This value indicates that the cluster is using Azure Data Lake Storage Gen2 for default storage. The ACCOUNTNAME and CONTAINER values have the same meanings as for Azure Storage mentioned previously.
adl://home - This value indicates that the cluster is using Azure Data Lake Storage Gen1 for default storage.
To find the Data Lake Storage account name, use the following examples:
The return value is similar to /clusters/CLUSTERNAME/. This value is a path within the Data Lake Storage account. This path is the root of the HDFS compatible file system for the cluster.
This example returns a JSON document containing the current configuration for installed components. See the tag value. The following example is an excerpt from the data returned from a Spark cluster type.
Get the configuration for the component that you're interested in. In the following example, replace INITIAL with the tag value returned from the previous request.
Jq is used to turn the data retrieved from HDInsight into a new configuration template. Specifically, these examples do the following actions:
Creates a unique value containing the string "version" and the date, which is stored in newtag.
Creates a root document for the new configuration.
Gets the contents of the .items[] array and adds it under the desired_config element.
Deletes the href, version, and Config elements, as these elements aren't needed to submit a new configuration.
Adds a tag element with a value of version#################. The numeric portion is based on the current date. Each configuration must have a unique tag.
Finally, the data is saved to the newconfig.json document. The document structure should appear similar to the following example:
Edit newconfig.json.
Open the newconfig.json document and modify/add values in the properties object. The following example changes the value of "livy.server.csrf_protection.enabled" from "true" to "false".
JSON
"livy.server.csrf_protection.enabled": "false",
Save the file once you're done making modifications.
Submit newconfig.json.
Use the following commands to submit the updated configuration to Ambari.
These commands submit the contents of the newconfig.json file to the cluster as the new configuration. The request returns a JSON document. The versionTag element in this document should match the version you submitted, and the configs object contains the configuration changes you requested.
Restart a service component
At this point, the Ambari web UI indicates the Spark service needs to be restarted before the new configuration can take effect. Use the following steps to restart the service.
Use the following to enable maintenance mode for the Spark2 service:
Bash
curl -u admin:$password -sS -H "X-Requested-By: ambari" \
-X PUT -d '{"RequestInfo": {"context": "turning on maintenance mode for SPARK2"},"Body": {"ServiceInfo": {"maintenance_state":"ON"}}}' \
"https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/services/SPARK2"
PowerShell
$resp = Invoke-WebRequest -Uri"https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/services/SPARK2" `
-Credential$creds -UseBasicParsing `
-Method PUT `
-Headers @{"X-Requested-By" = "ambari"} `
-Body'{"RequestInfo": {"context": "turning on maintenance mode for SPARK2"},"Body": {"ServiceInfo": {"maintenance_state":"ON"}}}'
Verify maintenance mode
These commands send a JSON document to the server that turns on maintenance mode. You can verify that the service is now in maintenance mode using the following request:
The href value returned by this URI is using the internal IP address of the cluster node. To use it from outside the cluster, replace the 10.0.0.18:8080 portion with the FQDN of the cluster.
Verify request.
Edit the command below by replacing 29 with the actual value for id returned from the prior step. The following commands retrieve the status of the request:
Azure HPC is a purpose-built cloud capability for HPC & AI workload, using leading-edge processors and HPC-class InfiniBand interconnect, to deliver the best application performance, scalability, and value. Azure HPC enables users to unlock innovation, productivity, and business agility, through a highly available range of HPC & AI technologies that can be dynamically allocated as your business and technical needs change. This learning path is a series of modules that help you get started on Azure HPC - you