An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
AKS cluster auto-upgrade did not execute during scheduled maintenance window
Service: Azure Kubernetes Service Impacted Resource: /subscriptions/2e165671-918f-427f-8e13-3ec8e39812e6/resourceGroups/rg-infra-r-westus3-tst/providers/Microsoft.ContainerService/managedClusters/clu-r-usw3-tst Description: Our AKS cluster clu-r-usw3-tst…
Azure Kubernetes Service
Unexpected AKS Fleet creation/billing by ContainerService after months of inactivity
Hi everyone, I’m running into a strange ghost billing issue and wanted to see if anyone else has experienced this kind of control plane behavior. I have not touched my Azure environment (portal or CLI) in over two months. However, I recently received an…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
[SR15] speech2-dev AKS clusters calling dSMS via Load Balancer egress IPs — requesting guidance on avoiding dSMS calls for dev environments
Related IcM: Incident 803310234 : [Emergency Broadcast]: AzRel Red Flag | MountainPass SR15 | Dsms/Dsts Service Tag - 11ed9226-335e-4d08-a623-4547014ba2cc Summary MountainPass SR15 requires all IPs calling dSMS/dSTS APIs to have registered Service Tags.…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
P2S VPN: private AKS API works, internal ingress LB times out
<moving personal details as a safety check to private messages section> On Point-to-Site VPN we can reach the private AKS API and use kubectl, but we cannot reach other VNet IPs, including our internal ingress load balancer. Browser/curl to the app…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
Network issue on specific AKS node
Hello, A specific instance in my D4ads_v7 node pool (of 7 nodes) started experiencing outbound connection issues at about 5AM EST this weekend. This issue continued until I drained the node and a new one was spun up. No other nodes experienced this…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
Persistent "Failed to allocate pool: Failed to delegate" Errors Across 160 Clusters
Summary We are experiencing a platform-wide, persistent issue where pods fail to start due to the Azure CNI plugin (azure-vnet) being unable to allocate IP addresses. The error message is: Failed to create pod sandbox: rpc error: code = Unknown desc =…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
Adding rows in grafana Dashboard - AKS cluster
I am trying to add rows and but not finding rows. can you please provide navigation steps?
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
How should I allow Argo Workflow pods egress to the Kubernetes API in a NetworkPolicy, and can the API endpoint change?
We run Argo Workflows in a namespace with a default-deny egress NetworkPolicy. The workflow pod needs to post to workflowtaskresults on the Kubernetes API, but it fails with a transient connection error when trying to reach the API server. I’m trying to…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
ASK can't pull image from ACR, always fails with 401 unauthorized error
Hello, I need help with a persistent 401 Unauthorized error when my AKS cluster tries to pull an image from an Azure Container Registry (ACR) from a different resource group (same subscription). Whenever the cluster pulls the image from the ACR, it fails…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
AKS AGIC Ingress Not Getting Public IP Address Even Though Application Gateway Is Running
My application deployment, service, and ingress are all created successfully. Pods are running correctly. AGIC pod is also running in kube-system namespace. However, when I run: kubectl get ingres , I am seeing no IP kubectl get ingress NAME …
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
Not able to create AKS Cluster using Azure Pass Subscription
No able to create AKS Cluster using Azure Pass Subscription {"code":"InvalidTemplateDeployment","details":[{"code":"ErrCode_InsufficientVCPUQuota","message":"Preflight validation check for…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
New AKS cluster networking never fully initialized, causing our apps and kube-system apps to crash
I created a new AKS cluster with the same bicep template that I've used in all my other regions. My apps that need to communicate with apiserver, and several apps from the kube-system and gatekeeper-system namespaces are failing to come up. Here's a…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
Join: Microsoft Azure Q&A Champions Program
The Microsoft Azure Q&A Champions Program recognizes and empowers a global community of internal and external Azure experts who help customers succeed by providing high-quality, trusted answers on Microsoft Q&A. The program scales expert led…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
AI answer
AKS Node Image Upgrade Failures
On May 10th, the scheduled node image upgrades failed across both lower environment and production Kubernetes clusters. The activity logs show only the following message: “Upgrade Failed with status Unspecified, error: Unknown error” Could you help…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
Azure Kubernetes Service node A100 suddenly down several time
Hi I am using Azure Kuberentes Service, and I already set up several node A100 to match with our requirement But sometime, in working hour, the A100 node is suddenly crash, when I checked the activities log, I can see that the there are some activity…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
AKS - Difference between on box log and logs on portal
What logs are provided via the portal versus the logs provided on the node themselves? Do I need to get all logs from both places?
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
Periods of API server downtime caused by internal server errors in underlay cluster
Across a number of AKS clusters, we have been experiencing reduced availability over recent months. This is characterised by periods of a few minutes where the API server is returning either 401 or 500 errors (depending on the API call being made).…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
AKS admission-webhook delivery failing intermittently via konnectivity
During a typical deploy (a sequence of helm chart installs), apiserver → pod admission-webhook calls intermittently fail with one of three errors. All three surface via the konnectivity tunnel (localhost:9443 is the konnectivity-server on the apiserver…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
AKS Metric and Log Collection
I need guidance and information on what exactly is available to us and if we can get this into our Grafana/Prometheus stack. Using the first screenshot on this page: https://learn.microsoft.com/en-us/azure/aks/monitor-aks?tabs=cilium ; we need more…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
AKS GPU Node Autoscaling Delay for vLLM LLM Workloads on A100 Nodes
I am trying to implement autoscaling for an AKS-based LLM inference workload using vLLM, where each replica serves a GPT OSS 120B model using tensor parallelism (tensor-parallel-size=4) across 4 A100 GPUs (Standard_NC96ads_A100_v4). Current setup: AKS…
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.