Configuration of Azure Monitor edge pipeline

Azure Monitor pipeline is a data ingestion pipeline providing consistent and centralized data collection for Azure Monitor. The edge pipeline enables at-scale collection, and routing of telemetry data before it's sent to the cloud. It can cache data locally and sync with the cloud when connectivity is restored and route telemetry to Azure Monitor in cases where the network is segmented and data cannot be sent directly to the cloud. This article describes how to enable and configure the edge pipeline in your environment.

Overview

The Azure Monitor edge pipeline is a containerized solution that is deployed on an Arc-enabled Kubernetes cluster and leverages OpenTelemetry Collector as a foundation. The following diagram shows the components of the edge pipeline. One or more data flows listen for incoming data from clients, and the pipeline extension forwards the data to the cloud, using the local cache if necessary.

The pipeline configuration file defines the data flows and cache properties for the edge pipeline. The DCR defines the schema of the data being sent to the cloud pipeline, a transformation to filter or modify the data, and the destination where the data should be sent. Each data flow definition for the pipeline configuration specifies the DCR and stream within that DCR that will process that data in the cloud pipeline.

Overview diagram of the dataflow for Azure Monitor edge pipeline.

Note

Private link is supported by edge pipeline for the connection to the cloud pipeline.

The following components and configurations are required to enable the Azure Monitor edge pipeline. If you use the Azure portal to configure the edge pipeline, then each of these components is created for you. With other methods, you need to configure each one.

Component Description
Edge pipeline controller extension Extension added to your Arc-enabled Kubernetes cluster to support pipeline functionality - microsoft.monitor.pipelinecontroller.
Edge pipeline controller instance Instance of the edge pipeline running on your Arc-enabled Kubernetes cluster.
Data flow Combination of receivers and exporters that run on the pipeline controller instance. Receivers accept data from clients, and exporters to deliver that data to Azure Monitor.
Pipeline configuration Configuration file that defines the data flows for the pipeline instance. Each data flow includes a receiver and an exporter. The receiver listens for incoming data, and the exporter sends the data to the destination.
Data collection endpoint (DCE) Endpoint where the data is sent to the Azure Monitor pipeline. The pipeline configuration includes a property for the URL of the DCE so the pipeline instance knows where to send the data.
Configuration Description
Data collection rule (DCR) Configuration file that defines how the data is received in the cloud pipeline and where it's sent. The DCR can also include a transformation to filter or modify the data before it's sent to the destination.
Pipeline configuration Configuration that defines the data flows for the pipeline instance, including the data flows and cache.

Supported configurations

Supported distros
Edge pipeline is supported on the following Kubernetes distributions:

  • Canonical
  • Cluster API Provider for Azure
  • K3
  • Rancher Kubernetes Engine
  • VMware Tanzu Kubernetes Grid

Supported locations
Edge pipeline is supported in the following Azure regions:

  • East US2
  • West US2
  • West Europe

Prerequisites

Workflow

You don't need a detail understanding of the different steps performed by the Azure Monitor pipeline to configure it using the Azure portal. You may need a more detailed understanding of it though if you use another method of installation or if you need to perform more advanced configuration such as transforming the data before it's stored in its destination.

The following tables and diagrams describe the detailed steps and components in the process for collecting data using the edge pipeline and passing it to the cloud pipeline for storage in Azure Monitor. Also included in the tables is the configuration required for each of those components.

Step Action Supporting configuration
1. Client sends data to the edge pipeline receiver. Client is configured with IP and port of the edge pipeline receiver and sends data in the expected format for the receiver type.
2. Receiver forwards data to the exporter. Receiver and exporter are configured in the same pipeline.
3. Exporter tries to send the data to the cloud pipeline. Exporter in the pipeline configuration includes URL of the DCE, a unique identifier for the DCR, and the stream in the DCR that defines how the data will be processed.
3a. Exporter stores data in the local cache if it can't connect to the DCE. Persistent volume for the cache and configuration of the local cache is enabled in the pipeline configuration.

Detailed diagram of the steps and components for data collection using Azure Monitor edge pipeline.

Step Action Supporting configuration
4. Cloud pipeline accepts the incoming data. The DCR includes a schema definition for the incoming stream that must match the schema of the data coming from the edge pipeline.
5. Cloud pipeline applies a transformation to the data. The DCR includes a transformation that filters or modifies the data before it's sent to the destination. The transformation may filter data, remove or add columns, or completely change its schema. The output of the transformation must match the schema of the destination table.
6. Cloud pipeline sends the data to the destination. The DCR includes a destination that specifies the Log Analytics workspace and table where the data will be stored.

Detailed diagram of the steps and components for data collection using Azure Monitor cloud pipeline.

Segmented network

Network segmentation is a model where you use software defined perimeters to create a different security posture for different parts of your network. In this model, you may have a network segment that can't connect to the internet or to other network segments. The edge pipeline can be used to collect data from these network segments and send it to the cloud pipeline.

Diagram of a layered network for Azure Monitor edge pipeline.

To use Azure Monitor pipeline in a layered network configuration, you must add the following entries to the allowlist for the Arc-enabled Kubernetes cluster. See Configure Azure IoT Layered Network Management Preview on level 4 cluster.

- destinationUrl: "*.ingest.monitor.azure.com"
  destinationType: external
- destinationUrl: "login.windows.net"
  destinationType: external

Create table in Log Analytics workspace

Before you configure the data collection process for the edge pipeline, you need to create a table in the Log Analytics workspace to receive the data. This must be a custom table since built-in tables aren't currently supported. The schema of the table must match the data that it receives, but there are multiple steps in the collection process where you can modify the incoming data, so you the table schema doesn't need to match the source data that you're collecting. The only requirement for the table in the Log Analytics workspace is that it has a TimeGenerated column.

See Add or delete tables and columns in Azure Monitor Logs for details on different methods for creating a table. For example, use the CLI command below to create a table with the three columns called Body, TimeGenerated, and SeverityText.

az monitor log-analytics workspace table create --workspace-name my-workspace --resource-group my-resource-group  --name my-table_CL --columns TimeGenerated=datetime Body=string SeverityText=string

Enable cache

Edge devices in some environments may experience intermittent connectivity due to various factors such as network congestion, signal interference, power outage, or mobility. In these environments, you can configure the edge pipeline to cache data by creating a persistent volume in your cluster. The process for this will vary based on your particular environment, but the configuration must meet the following requirements:

  • Metadata namespace must be the same as the specified instance of Azure Monitor pipeline.
  • Access mode must support ReadWriteMany.

Once the volume is created in the appropriate namespace, configure it using parameters in the pipeline configuration file below.

Caution

Each replica of the edge pipeline stores data in a location in the persistent volume specific to that replica. Decreasing the number of replicas while the cluster is disconnected from the cloud will prevent that data from being backfilled when connectivity is restored.

Enable and configure pipeline

The current options for enabling and configuration are detailed in the tabs below.

Configure pipeline using Azure portal

When you use the Azure portal to enable and configure the pipeline, all required components are created based on your selections. This saves you from the complexity of creating each component individually, but you made need to use other methods for

Perform one of the following in the Azure portal to launch the installation process for the Azure Monitor pipeline:

  • From the Azure Monitor pipelines (preview) menu, click Create.
  • From the menu for your Arc-enabled Kubernetes cluster, select Extensions and then add the Azure Monitor pipeline extension (preview) extension.

The Basic tab prompts you for the following information to deploy the extension and pipeline instance on your cluster.

Screenshot of Create Azure Monitor pipeline screen.

The settings in this tab are described in the following table.

Property Description
Instance name Name for the Azure Monitor pipeline instance. Must be unique for the subscription.
Subscription Azure subscription to create the pipeline instance.
Resource group Resource group to create the pipeline instance.
Cluster name Select your Arc-enabled Kubernetes cluster that the pipeline will be installed on.
Custom Location Custom location for your Arc-enabled Kubernetes cluster. This will be automatically populated with the name of a custom location that will be created for your cluster or you can select another custom location in the cluster.

The Dataflow tab allows you to create and edit dataflows for the pipeline instance. Each dataflow includes the following details:

Screenshot of Create add dataflow screen.

The settings in this tab are described in the following table.

Property Description
Name Name for the dataflow. Must be unique for this pipeline.
Source type The type of data being collected. The following source types are currently supported:
- Syslog
- OTLP
Port Port that the pipeline listens on for incoming data. If two dataflows use the same port, they will both receive and process the data.
Log Analytics Workspace Log Analytics workspace to send the data to.
Table Name The name of the table in the Log Analytics workspace to send the data to.

Verify configuration

Verify pipeline components running in the cluster

In the Azure portal, navigate to the Kubernetes services menu and select your Arc-enabled Kubernetes cluster. Select Services and ingresses and ensure that you see the following services:

  • <pipeline name>-external-service
  • <pipeline name>-service

Screenshot of cluster components supporting Azure Monitor edge pipeline.

Click on the entry for <pipeline name>-external-service and note the IP address and port in the Endpoints column. This is the external IP address and port that your clients will send data to.

Verify heartbeat

Each pipeline configured in your pipeline instance will send a heartbeat record to the Heartbeat table in your Log Analytics workspace every minute. The contents of the OSMajorVersion column should match the name your pipeline instance. If there are multiple workspaces in the pipeline instance, then the first one configured will be used.

Retrieve the heartbeat records using a log query as in the following example:

Screenshot of log query that returns heartbeat records for Azure Monitor edge pipeline.

Client configuration

Once your edge pipeline extension and instance are installed, then you need to configure your clients to send data to the pipeline.

Retrieve ingress endpoint

Each client requires the external IP address of the pipeline. Use the following command to retrieve this address:

kubectl get services -n <namespace where azure monitor pipeline was installed>

If the application producing logs is external to the cluster, copy the external-ip value of the service nginx-controller-service with the load balancer type. If the application is on a pod within the cluster, copy the cluster-ip value. If the external-ip field is set to pending, you will need to configure an external IP for this ingress manually according to your cluster configuration.

Client Description
Syslog Update Syslog clients to send data to the pipeline endpoint and the port of your Syslog dataflow.
OTLP The Azure Monitor edge pipeline exposes a gRPC-based OTLP endpoint on port 4317. Configuring your instrumentation to send to this OTLP endpoint will depend on the instrumentation library itself. See OTLP endpoint or Collector for OpenTelemetry documentation. The environment variable method is documented at OTLP Exporter Configuration.

Verify data

The final step is to verify that the data is received in the Log Analytics workspace. You can perform this verification by running a query in the Log Analytics workspace to retrieve data from the table.

Screenshot of log query that returns of Syslog collection.

Next steps