Редактиране

Споделяне чрез


Ports used by Apache Hadoop services on HDInsight

This document provides a list of the ports used by Apache Hadoop services running on HDInsight clusters. It also provides information on ports used to connect to the cluster using SSH.

Public ports vs. non-public ports

Linux-based HDInsight clusters only expose three ports publicly on the internet: 22, 23, and 443. These ports secure cluster access using SSH and services exposed over the secure HTTPS protocol.

HDInsight is implemented by several Azure Virtual Machines (cluster nodes) running on an Azure Virtual Network. From within the virtual network, you can access ports not exposed over the internet. If you connect via SSH to the head node, you can directly access services running on the cluster nodes.

Important

If you do not specify an Azure Virtual Network as a configuration option for HDInsight, one is created automatically. However, you can't join other machines (such as other Azure Virtual Machines or your client development machine) to this virtual network.

To join additional machines to the virtual network, you must create the virtual network first, and then specify it when creating your HDInsight cluster. For more information, see Plan a virtual network for HDInsight.

Public ports

All the nodes in an HDInsight cluster are located in an Azure Virtual Network. The nodes can't be directly accessed from the internet. A public gateway provides internet access to the following ports, which are common across all HDInsight cluster types.

Service Port Protocol Description
sshd 22 SSH Connects clients to sshd on the primary headnode. For more information, see Use SSH with HDInsight.
sshd 22 SSH Connects clients to sshd on the edge node. For more information, see Use SSH with HDInsight.
sshd 23 SSH Connects clients to sshd on the secondary headnode. For more information, see Use SSH with HDInsight.
Ambari 443 HTTPS Ambari web UI. See Manage HDInsight using the Apache Ambari Web UI
Ambari 443 HTTPS Ambari REST API. See Manage HDInsight using the Apache Ambari REST API
WebHCat 443 HTTPS HCatalog REST API. See Use MapReduce with Curl
HiveServer2 443 ODBC Connects to Hive using ODBC. See Connect Excel to HDInsight with the Microsoft ODBC driver.
HiveServer2 443 JDBC Connects to ApacheHive using JDBC. See Connect to Apache Hive on HDInsight using the Hive JDBC driver

The following are available for specific cluster types:

Service Port Protocol Cluster type Description
Stargate 443 HTTPS HBase HBase REST API. See Get started using Apache HBase
Livy 443 HTTPS Spark Spark REST API. See Submit Apache Spark jobs remotely using Apache Livy
Spark Thrift server 443 HTTPS Spark Spark Thrift server used to submit Hive queries. See Use Beeline with Apache Hive on HDInsight
Kafka REST proxy 443 HTTPS Kafka Kafka REST API. See Interact with Apache Kafka clusters in Azure HDInsight using a REST proxy

Authentication

All services publicly exposed on the internet must be authenticated:

Port Credentials
22 or 23 The SSH user credentials specified during cluster creation
443 The login name (default: admin) and password that were set during cluster creation

Non-public ports

Note

Some services are only available on specific cluster types. For example, HBase is only available on HBase cluster types.

Important

Some services only run on one headnode at a time. If you attempt to connect to the service on the primary headnode and receive an error, retry using the secondary headnode.

Ambari

Service Nodes Port URL path Protocol
Ambari web UI Head nodes 8080 / HTTP
Ambari REST API Head nodes 8080 /api/v1 HTTP

Examples:

  • Ambari REST API: curl -u admin "http://10.0.0.11:8080/api/v1/clusters"

HDFS ports

Service Nodes Port Protocol Description
NameNode web UI Head nodes 30070 HTTPS Web UI to view status
NameNode metadata service head nodes 8020 IPC File system metadata
DataNode All worker nodes 30075 HTTPS Web UI to view status, logs, and so on.
DataNode All worker nodes 30010   Data transfer
DataNode All worker nodes 30020 IPC Metadata operations
Secondary NameNode Head nodes 50090 HTTP Checkpoint for NameNode metadata

YARN ports

Service Nodes Port Protocol Description
Resource Manager web UI Head nodes 8088 HTTP Web UI for Resource Manager
Resource Manager web UI Head nodes 8090 HTTPS Web UI for Resource Manager
Resource Manager admin interface head nodes 8141 IPC For application submissions (Hive, Hive server, Pig, and so on.)
Resource Manager scheduler head nodes 8030 HTTP Administrative interface
Resource Manager application interface head nodes 8050 HTTP Address of the applications manager interface
NodeManager All worker nodes 30050   The address of the container manager
NodeManager web UI All worker nodes 30060 HTTP Resource Manager interface
Timeline address Head nodes 10200 RPC The Timeline service RPC service.
Timeline web UI Head nodes 8188 HTTP The Timeline service web UI

Hive ports

Service Nodes Port Protocol Description
HiveServer2 Head nodes 10001 Thrift Service for connecting to Hive (Thrift/JDBC)
Hive Metastore Head nodes 9083 Thrift Service for connecting to Hive metadata (Thrift/JDBC)

WebHCat ports

Service Nodes Port Protocol Description
WebHCat server Head nodes 30111 HTTP Web API on top of HCatalog and other Hadoop services

MapReduce ports

Service Nodes Port Protocol Description
JobHistory Head nodes 19888 HTTP MapReduce JobHistory web UI
JobHistory Head nodes 10020   MapReduce JobHistory server
ShuffleHandler   13562   Transfers intermediate Map outputs to requesting Reducers

Oozie

Service Nodes Port Protocol Description
Oozie server Head nodes 11000 HTTP URL for Oozie service
Oozie server Head nodes 11001 HTTP Port for Oozie admin

Ambari Metrics

Service Nodes Port Protocol Description
TimeLine (Application history) Head nodes 6188 HTTP The TimeLine service web UI
TimeLine (Application history) Head nodes 30200 RPC The TimeLine service web UI

HBase ports

Service Nodes Port Protocol Description
HMaster Head nodes 16000    
HMaster info Web UI Head nodes 16010 HTTP The port for the HBase Master web UI
Region server All worker nodes 16020  
Region server info Web UI   All worker nodes 16030 HTTP The port for the HBase Region server Web UI
2181 The port that clients use to connect to ZooKeeper

Kafka ports

Service Nodes Port Protocol Description
Broker Worker nodes 9092 Kafka Wire Protocol Used for client communication
  Zookeeper nodes 2181   The port that clients use to connect to Zookeeper
REST proxy Kafka management nodes 9400 HTTPS Kafka REST specification

Spark ports

Service Nodes Port Protocol URL path Description
Spark Thrift servers Head nodes 10002 Thrift   Service for connecting to Spark SQL (Thrift/JDBC)
Livy server Head nodes 8998 HTTP   Service for running statements, jobs, and applications
Jupyter Notebook Head nodes 8001 HTTP   Jupyter Notebook website

Examples:

  • Livy: curl -u admin -G "http://10.0.0.11:8998/". In this example, 10.0.0.11 is the IP address of the headnode that hosts the Livy service.