Azure HDInsight: Step by step automated secure deployment
1. Introduction
Azure HDInsight is an Apache Hadoop distribution powered by the cloud. This means that it handles any amount of data, scaling from terabytes to petabytes on demand. Spin up any number of nodes at any time.
Since HDInsight is a PaaS offering, it is by default publicly accessable from any internet connection. The cluster contains often valuable data of customers. These customers also have requirements how to securely connect to this data, for example using IP restrictions so only their block of IP addresses can connect to the cluster.
This WiKi article illustrates how to deploy SPARK 1.6.1 (HDI 3.4) cluster on Azure fully automated and secured. It is only accessible from IP ranges you specify.
2. Ingredients
We need the following files to do our deployment:
- Deploy-AzureHDInsightEnvironment.ps1 - download the file here, rename from .tst to .ps1
- Template.json - download the file here
- Parameters.json - download the file here
- Azure PowerShell Module - the file here
3. Let's deploy
- Download all the files from chapter 2 and place them in C:\TEMP
- Change the values in Template.json You must change the values in line 15, 31, 34, 35, 51, 55, 56, 70, 74, 75. Change <subscription-id> to your Azure Subscription ID Optionally change the values in line: 17, 19, 85, 87
- Change the value in Parameters.json You must change the value of AZU-HCL-HDINSIGHT-PRD
- Change the variables in DeployHDInsightEnvironment.ps1
- Now that we have our deployment framework ready. It's time to deploy the HDInsight cluster.
- Run DeployHDInsightEnvironment.ps1 as Administrator and log in to Azure.
- Wait for the cluster to be created. Might take 45 minutes.
- When the cluster is created, it is not accessible from the internet. You must add your IP address (range) to the Network Security Group (inbound). See the existing rules for an example. Applying Network Security Rules might take 15 minutes.