Use Azure Kubernetes Service with Apache Kafka on HDInsight
Learn how to use Azure Kubernetes Service (AKS) with Apache Kafka on HDInsight cluster. The steps in this document use a Node.js application hosted in AKS to verify connectivity with Kafka. This application uses the kafka-node package to communicate with Kafka. It uses Socket.io for event driven messaging between the browser client and the back-end hosted in AKS.
Apache Kafka is an open-source distributed streaming platform that can be used to build real-time streaming data pipelines and applications. Azure Kubernetes Service manages your hosted Kubernetes environment, and makes it quick and easy to deploy containerized applications. Using an Azure Virtual Network, you can connect the two services.
Note
The focus of this document is on the steps required to enable Azure Kubernetes Service to communicate with Kafka on HDInsight. The example itself is just a basic Kafka client to demonstrate that the configuration works.
Prerequisites
- Azure CLI
- An Azure subscription
This document assumes that you are familiar with creating and using the following Azure services:
- Kafka on HDInsight
- Azure Kubernetes Service
- Azure Virtual Networks
This document also assumes that you completed the Azure Kubernetes Service tutorial. This article creates a container service, creates a Kubernetes cluster, a container registry, and configures the kubectl
utility.
Architecture
Network topology
Both HDInsight and AKS use an Azure Virtual Network as a container for compute resources. To enable communication between HDInsight and AKS, you must enable communication between their networks. The steps in this document use Virtual Network Peering to the networks. Other connections, such as VPN, should also work. For more information on peering, see the Virtual network peering document.
The following diagram illustrates the network topology used in this document:
Important
Name resolution is not enabled between the peered networks, so IP addressing is used. By default, Kafka on HDInsight is configured to return host names instead of IP addresses when clients connect. The steps in this document modify Kafka to use IP advertising instead.
Create an Azure Kubernetes Service (AKS)
If you do not already have an AKS cluster, use one of the following documents to learn how to create one:
- Deploy an Azure Kubernetes Service (AKS) cluster - Portal
- Deploy an Azure Kubernetes Service (AKS) cluster - CLI
- Deploy an Azure Kubernetes Service (AKS) cluster - PowerShell
Important
AKS creates a virtual network during installation in an additional resource group. The additional resource group follows the naming convention of MC_resourceGroup_AKSclusterName_location.
This network is peered to the one created for HDInsight in the next section.
Configure virtual network peering
Identify preliminary information
From the Azure portal, locate the additional Resource group that contains the virtual network for your AKS cluster.
From the resource group, select the Virtual network resource. Note the name for later use.
Under Settings, select Address space. Note the address space listed.
Create virtual network
To create a virtual network for HDInsight, navigate to + Create a resource > Networking > Virtual network.
Create the network using the following guidelines for certain properties:
Property Value Address space You must use an address space that does not overlap the one used by the AKS cluster network. Location Use the same Location for the virtual network that you used for the AKS cluster. Wait until the virtual network is created before proceeding to the next step.
Configure peering
To configure the peering between the HDInsight network and the AKS cluster network, select the virtual network and then select Peerings.
Select + Add and use the following values to populate the form:
Property Value Name of the peering from <this VN> to remote virtual network Enter a unique name for this peering configuration. Virtual network select the virtual network for the AKS cluster. Name of the peering from <AKS VN> to <this VN> Enter a unique name. Leave all other fields at the default value, then select OK to configure the peering.
Create Apache Kafka cluster on HDInsight
When creating the Kafka on HDInsight cluster, you must join the virtual network created earlier for HDInsight. For more information on creating a Kafka cluster, see the Create an Apache Kafka cluster document.
Configure Apache Kafka IP Advertising
Use the following steps to configure Kafka to advertise IP addresses instead of domain names:
Using a web browser, go to
https://CLUSTERNAME.azurehdinsight.net
. Replace CLUSTERNAME with the name of the Kafka on HDInsight cluster.When prompted, use the HTTPS user name and password for the cluster. The Ambari Web UI for the cluster is displayed.
To view information on Kafka, select Kafka from the list on the left.
To view Kafka configuration, select Configs from the top middle.
To find the kafka-env configuration, enter
kafka-env
in the Filter field on the upper right.To configure Kafka to advertise IP addresses, add the following text to the bottom of the kafka-env-template field:
# Configure Kafka to advertise IP addresses instead of FQDN IP_ADDRESS=$(hostname -i) echo advertised.listeners=$IP_ADDRESS sed -i.bak -e '/advertised/{/advertised@/!d;}' /usr/hdp/current/kafka-broker/conf/server.properties echo "advertised.listeners=PLAINTEXT://$IP_ADDRESS:9092" >> /usr/hdp/current/kafka-broker/conf/server.properties
To configure the interface that Kafka listens on, enter
listeners
in the Filter field on the upper right.To configure Kafka to listen on all network interfaces, change the value in the listeners field to
PLAINTEXT://0.0.0.0:9092
.To save the configuration changes, use the Save button. Enter a text message describing the changes. Select OK once the changes are saved.
To prevent errors when restarting Kafka, use the Service Actions button and select Turn On Maintenance Mode. Select OK to complete this operation.
To restart Kafka, use the Restart button and select Restart All Affected. Confirm the restart, and then use the OK button after the operation is completed.
To disable maintenance mode, use the Service Actions button and select Turn Off Maintenance Mode. Select OK to complete this operation.
Test the configuration
At this point, Kafka and Azure Kubernetes Service are in communication through the peered virtual networks. To test this connection, use the following steps:
Create a Kafka topic that is used by the test application. For information on creating Kafka topics, see the Create an Apache Kafka cluster document.
Download the example application from https://github.com/Blackmist/Kafka-AKS-Test.
Edit the
index.js
file and change the following lines:var topic = 'mytopic'
: Replacemytopic
with the name of the Kafka topic used by this application.var brokerHost = '176.16.0.13:9092'
: Replace176.16.0.13
with the internal IP address of one of the broker hosts for your cluster.To find the internal IP address of the broker hosts (worker nodes) in the cluster, see the Apache Ambari REST API document. Pick IP address of one of the entries where the domain name begins with
wn
.
From a command line in the
src
directory, install dependencies and use Docker to build an image for deployment:docker build -t kafka-aks-test .
Note
Packages required by this application are checked into the repository, so you do not need to use the
npm
utility to install them.Log in to your Azure Container Registry (ACR) and find the loginServer name:
az acr login --name <acrName> az acr list --resource-group myResourceGroup --query "[].{acrLoginServer:loginServer}" --output table
Note
If you don't know your Azure Container Registry name, or are unfamiliar with using the Azure CLI to work with the Azure Kubernetes Service, see the AKS tutorials.
Tag the local
kafka-aks-test
image with the loginServer of your ACR. Also add:v1
to the end to indicate the image version:docker tag kafka-aks-test <acrLoginServer>/kafka-aks-test:v1
Push the image to the registry:
docker push <acrLoginServer>/kafka-aks-test:v1
This operation takes several minutes to complete.
Edit the Kubernetes manifest file (
kafka-aks-test.yaml
) and replacemicrosoft
with the ACR loginServer name retrieved in step 4.Use the following command to deploy the application settings from the manifest:
kubectl create -f kafka-aks-test.yaml
Use the following command to watch for the
EXTERNAL-IP
of the application:kubectl get service kafka-aks-test --watch
Once an external IP address is assigned, use CTRL + C to exit the watch
Open a web browser and enter the external IP address for the service. You arrive at a page similar to the following image:
Enter text into the field and then select the Send button. The data is sent to Kafka. Then the Kafka consumer in the application reads the message and adds it to the Messages from Kafka section.
Warning
You may receive multiple copies of a message. This problem usually happens when you refresh your browser after connecting, or open multiple browser connections to the application.
Next steps
Use the following links to learn how to use Apache Kafka on HDInsight: