How to create Native Prometheus Alert on Managed AKS/Prometheus

KaushikKumar Patel 41 Reputation points Microsoft Employee
2024-11-08T17:11:34.38+00:00

Hey folks, I have Managed Grafana and Prometheus on AKS. Native Opensource Prometheus has been in use for few years, hence has many custom (application/service) related alerting and recording rule setup. 

 

Is there anyway in Azure, where we can deploy these prometheus alert rules without using azure prometheus terraform/bicep logic.

 

Goal is each app/service team manages their own set of alerts which are deployed as part of their app deployment with other K8s Manifest such as hpa, deployment, keda scaleset, configmap, and such.

 

This is an important requirement for us since migrating this to tf/bicep will add significant operation overhead.

I already have helm templates for Prometheus alerts, but not sure how to get them applied/deployment to managed AKS/Prometheus.

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,456 questions
{count} votes

Accepted answer
  1. Akshay kumar Mandha 3,390 Reputation points Microsoft External Staff Moderator
    2024-11-28T12:28:37.7066667+00:00

    Hi KaushikKumar Patel,
    We understand your point, and you have found an alternative solution to this query. We will repost your solution in the answer section, as the original poster cannot accept their own answer. This way, you can accept it as the answer. An accepted answer will help other community members navigate to the appropriate solutions

    Issue: I have Managed Grafana and Prometheus on AKS. Native Opensource Prometheus has been in use for few years, hence has many custom (application/service) related alerting and recording rule setup. Is there anyway in Azure, where we can deploy these prometheus alert rules without using azure prometheus terraform/bicep logic.

    Goal is each app/service team manages their own set of alerts which are deployed as part of their app deployment with other K8s Manifest such as hpa, deployment, keda scaleset, configmap, and such.

     This is an important requirement for us since migrating this to tf/bicep will add significant operation overhead. I already have helm templates for Prometheus alerts, but not sure how to get them applied/deployment to managed AKS/Prometheus.

    Solution:
    Decided to use: https://github.com/Azure/prometheus-collector/tree/main/tools/az-prom-rules-converter

    And, we will also leverage az cli tool which creates rule via supply yaml file.

    DockerfileCopy

    az alerts-management prometheus-rule-group create --name
                                                      --resource-group
                                                      --rules
                                                      --scopes
                                                      [--cluster-name]
                                                      [--description]
                                                      [--enabled {0, 1, f, 
    

    Azure Managed Prometheus Rule Group Argument Details

    --rules [Required] : List<Object> Defines the rules in the Prometheus rule group. See https://github.com/Azure/azure-cli/tree/dev/doc/shorthand_syntax.md for more about shorthand syntax.

    Azure List Element Properties

    • expression [Required] (string) The PromQL expression to evaluate. Evaluated periodically as given by 'interval', and the result recorded as a new set of time series with the metric name as given by 'record'.
    • actions (list of map) Actions that are performed when the alert rule becomes active, and when an alert condition is resolved.
    • alert (string) Alert rule name.
    • annotations (object) The annotations clause specifies a set of informational labels that can be used to store longer additional information such as alert descriptions or runbook links. Only valid for alerts. The annotation values can be templated.
    • enabled (bool) Enable/disable rule. Allowed values: 0, 1, f, false, n, no, t, true, y,yes.
    • for (string) The amount of time alert must be active before firing.
    • labels (object) Labels to add or overwrite before storing the result.
    • record (string) Recorded metrics name.
    • resolve-configuration (map) Defines the configuration for resolving fired alerts.
    • autoResolved (bool) the flag that indicates whether or not to auto resolve a fired alert.
      • timeToResolve (string) the duration a rule must evaluate as healthy before the fired alert is automatically resolved represented in ISO 8601 duration format. Should be between 1 and 15 minutes
      • severity (int) The severity of the alerts fired by the rule. Must be between 0 (critical), 1 (error), 2 (warning), 3 (info) and 4 (verbose).

    Native Prometheus Rule Group vs Azure Managed Prometheus Rule Group Comparison

    Here's a table comparing native Prometheus recording and alert rules with Azure Managed Prometheus rule groups:

    Expand table

    Rule Group SpecificationNative Prometheus Rule GroupAzure Managed Prometheus Rule Groupalert (optional)Name of the alertAlert rule name (optional)alert (optional)Name of the alertAlert rule name (optional)alert (optional)Name of the alertAlert rule name (optional)expr (required)PromQL expression to trigger alertPromQL expression (required)for (required)How long expression true before firingTime alert must be active before firingkeep_firing_forDuration alert remains active after triggeringNot directly supportedlabelsKey-value pairs added as labelsLabels to add or overwrite (object)annotationsDescriptive information about the alertAnnotations for additional informationrecord (optional)Recorded metrics nameRecorded metrics name (not allowed if alert spec is used)actions (optional)N/AActions performed on alert state changesenabled (optional)N/AEnable/disable ruleresolve-configurationN/AConfiguration for resolving alertsautoResolvedN/AFlag for auto-resolving alertstimeToResolveN/ADuration for auto-resolve (ISO 8601)severityN/AThe severity of the alerts fired by the rule. Must be between 0 and 4Prometheus Rule to Azure Prometheus Rule Group Converter

    A tool to convert Prometheus rules YAML file files to Azure Prometheus rule groups ARM template

    Create prometheus rule group using Azure CLI with rule group file

    PowerShellCopy

    az alerts
    

    Create prometheus rule group using Azure CLI with rule details in cli

    JSONCopy

    az alerts-management prometheus-rule-group create -n TestPrometheusRuleGroup -g TestResourceGroup \
      -l westus --enabled 
    

    Native Prometheus Rule Group Spec

    Reference

    Customize Azure Managed Prometheus

    • Minimal ingestion profile for Prometheus metrics in Azure Monitor
    • Create and validate custom configuration file for Prometheus metrics in Azure Monitor
    • Customize collection using CRDs (Service and Pod Monitors)
    • Integrate KEDA with your Azure Kubernetes Service cluster az alerts-management prometheus-rule-group create --nameDockerfileCopy
                                                            --resource-group
                                                        --rules
                                                        --scopes
                                                        [--cluster-name]
                                                        [--description]
                                                        [--enabled {0, 1, f, 
      
      Azure Managed Prometheus Rule Group Argument Details --rules [Required] : List<Object> Defines the rules in the Prometheus rule group. See https://github.com/Azure/azure-cli/tree/dev/doc/shorthand_syntax.md for more about shorthand syntax. Azure List Element Properties
      • expression [Required] (string) The PromQL expression to evaluate. Evaluated periodically as given by 'interval', and the result recorded as a new set of time series with the metric name as given by 'record'.
      • actions (list of map) Actions that are performed when the alert rule becomes active, and when an alert condition is resolved.
      • alert (string) Alert rule name.
      • annotations (object) The annotations clause specifies a set of informational labels that can be used to store longer additional information such as alert descriptions or runbook links. Only valid for alerts. The annotation values can be templated.
      • enabled (bool) Enable/disable rule. Allowed values: 0, 1, f, false, n, no, t, true, y,yes.
      • for (string) The amount of time alert must be active before firing.
      • labels (object) Labels to add or overwrite before storing the result.
      • record (string) Recorded metrics name.
      • resolve-configuration (map) Defines the configuration for resolving fired alerts. Only relevant for alerts.
        • autoResolved (bool) the flag that indicates whether or not to auto resolve a fired alert.
        • timeToResolve (string) the duration a rule must evaluate as healthy before the fired alert is automatically resolved represented in ISO 8601 duration format. Should be between 1 and 15 minutes
      • severity (int) The severity of the alerts fired by the rule. Must be between 0 (critical), 1 (error), 2 (warning), 3 (info) and 4 (verbose).
      Native Prometheus Rule Group vs Azure Managed Prometheus Rule Group Comparison Here's a table comparing native Prometheus recording and alert rules with Azure Managed Prometheus rule groups:Expand table

    Rule Group SpecificationNative Prometheus Rule GroupAzure Managed Prometheus Rule Groupalert (optional)Name of the alertAlert rule name (optional)alert (optional)Name of the alertAlert rule name (optional)expr (required)PromQL expression to trigger alertPromQL expression (required)for (required)How long expression true before firingTime alert must be active before firingkeep_firing_forDuration alert remains active after triggeringNot directly supportedlabelsKey-value pairs added as labelsLabels to add or overwrite (object)annotationsDescriptive information about the alertAnnotations for additional informationrecord (optional)Recorded metrics nameRecorded metrics name (not allowed if alert spec is used)actions (optional)N/AActions performed on alert state changesenabled (optional)N/AEnable/disable ruleresolve-configurationN/AConfiguration for resolving alertsautoResolvedN/AFlag for auto-resolving alertstimeToResolveN/ADuration for auto-resolve (ISO 8601)severityN/AThe severity of the alerts fired by the rule. Must be between 0 and 4Prometheus Rule to Azure Prometheus Rule Group Converter A tool to convert Prometheus rules YAML file files to Azure Prometheus rule groups ARM template Create prometheus rule group using Azure CLI with rule group file

        az alerts
    

    Create prometheus rule group using Azure CLI with rule details in cliJSONCopy

      az alerts-management prometheus-rule-group create -n TestPrometheusRuleGroup -g TestResourceGroup \ -l westus --enabled
    

    User's image

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Akshay kumar Mandha 3,390 Reputation points Microsoft External Staff Moderator
    2024-11-12T08:29:22.55+00:00

    Hi KaushikKumar Patel,
    Thanks for patience while we are reviewing your thread

    Based on your query what I understand do you want to deploy the existing custom rules in AKS without terraform and biceps if that is the case could you please try below things and let us know

    1.Create a ConfigMap for Prometheus Rules:

    Store your custom Prometheus rules in a ConfigMap. This ConfigMap will be mounted into the Prometheus container.

    ConfigMap (prometheus-rules-configmap.yaml):

    yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-rules
      namespace: monitoring
    data:
      prometheus.rules: |
        groups:
        - name: custom_rules
          rules:
          - alert: HighMemoryUsage
            expr: node_memory_Active_bytes / node_memory_MemTotal_bytes * 100 > 80
            for: 5m
            labels:
              severity: critical
            annotations:
              summary: "High Memory usage detected"
              description: "Memory usage is above 80% for more than 5 minutes."
    
    1. Modify Prometheus Deployment to Mount ConfigMap:

    Update your Prometheus deployment to mount the ConfigMap containing the custom rules.

    Prometheus Deployment (prometheus-deployment.yaml):

    yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: prometheus
      namespace: monitoring
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: prometheus
      template:
        metadata:
          labels:
            app: prometheus
        spec:
          containers:
          - name: prometheus
            image: prom/prometheus
            args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--web.enable-lifecycle"
            volumeMounts:
            - name: prometheus-config
              mountPath: /etc/prometheus
            - name: prometheus-rules
              mountPath: /etc/prometheus/rules
          volumes:
          - name: prometheus-config
            configMap:
              name: prometheus-config
          - name: prometheus-rules
            configMap:
              name: prometheus-rules
              items:
              - key: prometheus.rules
                path: prometheus.rules.yml
     
    

    Use kubectl apply to deploy the ConfigMap and update the Prometheus deployment in your AKS cluster.

    Please let us know if you face any difficulties we will help as you needed and do you have any other question, please tag me in comment

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.