An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
Dimitris Bratsos Based on the timeline and the events you have shared,
Node event timeline and would like to clarify the behavior observed. The node failure was sudden and unplanned, and there was no prior drain or maintenance activity initiated on the node. This is confirmed by the absence of any drain or cordon events before the node transitioned to a NotReady state.
Following the node going down, Kubernetes automatically initiated pod eviction as part of its standard recovery mechanism. This is an expected behavior where the platform tries to move workloads away from an unhealthy node.
Regarding the observed PVC (Persistent Volume Claim) spike, this is not due to increased disk usage. Instead, it is related to temporary disk attachment contention. Since the affected node became unresponsive, the disks attached to it could not be immediately detached. When Kubernetes attempted to reschedule workloads onto another node, the disk reattachment failed initially because the disks were still logically attached to the original node.
This condition persisted for a short duration until the platform completed backend remediation and safely detached the disks. Once this process was completed, the disks were successfully reattached to the new node and the workloads resumed normal operation.
In summary, the behavior observed (PVC spike and workload delay) is consistent with an unexpected node failure followed by standard recovery and disk reattachment handling, and not due to any pre-planned drain or workload-triggered spike.