Manage Apache Hadoop clusters in HDInsight by using the Azure portal
Article
By using the Azure portal, you can manage Apache Hadoop clusters in Azure HDInsight. Use the tab selector for information on managing Hadoop clusters in HDInsight by using other tools.
Set key/value pairs to define a custom taxonomy of your cloud services. For example, you might create a key named project, and then use a common value for all services associated with a specific project.
Diagnose and solve problems
Display troubleshooting information.
Quickstart
Display information that helps you get started using HDInsight.
Tools
Help information for HDInsight-related tools.
Settings menu
Item
Description
Cluster size
Check, increase, and decrease the number of cluster worker nodes. See Scale clusters.
Quota limits
Display the used and available cores for your subscription.
SSH + Cluster login
Show the instructions to connect to the cluster by using a Secure Shell (SSH) connection. For more information, see Use SSH with HDInsight.
Select Move to another resource group or Move to another subscription.
Follow the instructions on the new page.
Delete clusters
Deleting a cluster doesn't delete the default storage account or any linked storage accounts. You can re-create the cluster by using the same storage accounts and the same metastores. We recommend that you use a new default blob container when you re-create the cluster.
You can add more Azure Storage accounts and Azure Data Lake Storage accounts after a cluster is created. For more information, see Add additional storage accounts to HDInsight.
Scale clusters
You can use the cluster scaling feature to change the number of worker nodes that are used by an HDInsight cluster, without having to re-create the cluster.
Most Hadoop jobs are batch jobs that run only occasionally. For most Hadoop clusters, there are large periods of time when the cluster isn't used for processing. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it isn't in use. Because the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use.
You can program the process in many ways. You can use:
Ambari provides an intuitive, easy-to-use Hadoop management web UI that's backed by its RESTful APIs. With Ambari, system administrators can manage and monitor Hadoop clusters.
An HDInsight cluster can have two user accounts. The HDInsight cluster user account (HTTP user account) and the SSH user account are created during the creation process. You can use the portal to change the cluster user account password and use script actions to change the SSH user account.
Change the cluster user password
Note
Changing the cluster user (admin) password might cause script actions that run against this cluster to fail. If you have any persisted script actions that target worker nodes, these scripts might fail when you add nodes to the cluster through resize operations. For more information on script actions, see Customize HDInsight clusters by using script actions.
Upload the file to a storage location that you can access from HDInsight by using an HTTP or HTTPS address. An example is a public file store such as OneDrive or Azure Blob Storage. Save the URI (HTTP or HTTPS address) to the file. The URI is needed in the next step.
On the Submit script action page, enter the information in the following table.
Note
SSH passwords can't contain the following characters: " ' ` / \ < % ~ | $ & ! #
Field
Value
Script type
Select - Custom from the dropdown list.
Name
"Change ssh credentials."
Bash script URI
The URI to the changecredentials.sh file.
Node types: Head, Worker, Nimbus, Supervisor, or ZooKeeper
Select ✓ for all node types listed.
Parameters
Enter the SSH username, and then enter the new password. There should be only one space between the username and the password.
Persist this script action ...
Leave this field clear.
Select Create to apply the script. After the script finishes, you can connect to the cluster by using SSH with the new credentials.
Find the subscription ID
Each cluster is tied to an Azure subscription. The Azure subscription ID is visible on the cluster home page.
Find the resource group
In the Resource Manager mode, each HDInsight cluster is created with a Resource Manager group. The Resource Manager group is visible on the cluster home page.
Find the storage accounts
HDInsight clusters use either an Azure Storage account or Data Lake Storage to store data. Each HDInsight cluster can have one default storage account and many linked storage accounts. To list the storage accounts, on the cluster home page, under Settings, select Storage accounts.
The Cluster size tile on the cluster home page displays the number of cores allocated to this cluster and how they're allocated for the nodes within this cluster.
Important
To monitor the services provided by the HDInsight cluster, you must use the Ambari web UI or the Ambari REST API. For more information on using Ambari, see Manage HDInsight clusters by using Apache Ambari.
Azure HPC is a purpose-built cloud capability for HPC & AI workload, using leading-edge processors and HPC-class InfiniBand interconnect, to deliver the best application performance, scalability, and value. Azure HPC enables users to unlock innovation, productivity, and business agility, through a highly available range of HPC & AI technologies that can be dynamically allocated as your business and technical needs change. This learning path is a series of modules that help you get started on Azure HPC - you