Azure Batch - Nodes will not leave the pool (stuck in leaving pool state)
We have had a batch system running for years. Recently we have seen issues where our node pools will scale up no problem, but will fail to scale down for hours stuck in 'leaving' state.
This is currently happening and I have no way to remove the nodes without deleting the pool (I suspect even this will fail).
Has anyone ever encountered this? I'm worried our company is being charged for machines we aren't using which is extremely alarming when using a fully automated system.
Edit: The nodes finally left the pool after 4 hours. I'm still interested as to why this occurs though, and if the time the node is in leavingpool state costs money to the customer.