question

AttilaSimon-0012 avatar image
0 Votes"
AttilaSimon-0012 asked AttilaSimon-0012 answered

Collision on non-unique "headnodehost" hostname across HDInsight clusters

The context:
Let's suppose I have multiple HdInsight4.0 clusters. Also suppose that I would like to access the Hadoop services eg jobhistory server running inside these clusters. Let's suppose I get the corresponding jobhistory address from each cluster via Ambari client configuration API. To be precise I fetch the value of the "mapreduce.jobhistory.address" hadoop property.

Ambari answers back the "headnodehost:10020" string. This is fine - one might guess - if I'm on a cluster node since all nodes have an /etc/hosts file which knows about "headnodehost" hostname:
73400-image.png

But I'm not on a cluster node! Also I have a setup which registers all of the unique hostnames of every HDInsight cluster nodes so I can access those from my node. In other words I'm on a node which is able to reach all of my HDInsight clusters (network connectivity is provided). As you would guess this is where things get complicated. What should I do with "headnodehost". I cannot use the returned "headnodehost" hostname to establish TCP/IP connectivity simply because all of my HDInsight clusters have one which resolves to multiple different internal IP in each cluster. Obviously One might mistakenly say that I might as well find out what is the unique hostname alternative for that very same node like: "hn0-hdi101.iuyf3i2yrrvetpdqnyswcj2c3b.fx.internal.cloudapp.net" or "hn0-hdi101" and use that for TCP/IP but my automatism (and client libraries) rely on the "mapreduce.jobhistory.address" hadoop property, as well as these following properties fetched from Hadoop cluster via Ambari so this is approach would be a bottomless rabbit hole:

73737-image.png

My questions:

  • Is it possible to provision the HDInsight cluster in a way that jobhistory service setup would be configured with one of the unique hostnames?

  • Alternatively is it possible to make this concept of "headnodehost" alias globally unique? Like prefixing it with clusterid: "hdi101.headnodehost" and configure that for Hadoop services like jobhistory server during cluster creation? Additionally keeping the "headnodehost" entry on the cluster /etc/hosts could maintain backward compatibility for existing applications.




azure-hdinsight
image.png (85.0 KiB)
image.png (20.8 KiB)
· 7
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @AttilaSimon-0012,

Welcome to Microsoft Q&A forum and thanks for your query.

We are reaching out to internal to get more details on this ask and will get back to you as soon we have an update.

Thank you for your patience.

0 Votes 0 ·

Hi @AttilaSimon-0012,

Product team would like to understand the reason for getting internal endpoints from these configs. Is there any special user case?

Thank you

0 Votes 0 ·

My application relies on Hadoop, HDFS, Mapreduce, YARN, etc client libraries which rely on the direct accessibility of these services. Configuring these libraries are traditionally done by the aforementioned properties. I don't think this would be special in any means.

0 Votes 0 ·

Hi @AttilaSimon-0012,

Sorry for the delay in my response. After having conversation with internal teams, below are the inputs.

  1. We can set with unique name of either HN0 or HN1 but whenever there is failover of Master services occur again you need to change the Hostname where the JobHistory server is running. JobHistory server always runs on the active headnode, that's the reason we have set it headnodehost:10020, which will automatically take care internally. Here is reference for high available servers on which node which service will run:

High availability components in Azure HDInsight | Microsoft Docs

Continued in below comment.

0 Votes 0 ·
Show more comments

1 Answer

AttilaSimon-0012 avatar image
0 Votes"
AttilaSimon-0012 answered

Tag 862u:

Hi KranthiPakala-MSFT,

I think it might be better to give a very direct suggestion to illustrate what I want to achieve here. (Sorry for this approach, I hope it won't be inappropriate, I'm doing it with good intents to give you a different aspect)

  • Please configure JobHistory, Timeline Service services in a way that the above mentioned properties are referring to <clustername>.headnodehost (following my example above would result eg: mapreduce.jobhistory.address=hdi101.headnodehost:10020). It is important that Ambari must report this value.

  • Please add the <clustername>.headnodehost alias to networking. You might as well maintain its IP address similarly to the headnodehost alias during failovers. This will result an additional, externally referable and unique hostname alias for the master node which won't collide even if I happen to have multiple HDInsight4.0 clusters in the network at any given point in time.

  • You can keep all the existing hostname aliases as indicated in my screenshot (including the headnodehost alias as well to maintain backward compatibility for you)

Cheers

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.