Pod dataprotection-microsoft-kubernetes-agent-upgrade-crds status pending
The pod dataprotection-microsoft-kubernetes-agent-upgrade-crds
still pending since 15 days with error message "0/2 nodes are available: 2 Insufficient cpu. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod."
Every scheduled backup actually fails.
Furthermore, no error reports in this regard were received from the team
Azure Backup
Azure Kubernetes Service (AKS)
-
Rajat Shrivastava 0 Reputation points • Microsoft Employee
2023-11-13T09:21:21.1933333+00:00 Hi @Francesco Landoni , this error appears when there is not sufficient compute capacity available within the cluster to get the data protection pods up. Please increase the number of nodes in the cluster and then check if the pods are up and running under the namespace "dataprotection-microsoft". Once the pods are up, the scheduled backups should start running.
-
SadiqhAhmed-MSFT 41,716 Reputation points • Microsoft Employee
2023-11-14T09:39:05.9366667+00:00 @Francesco Landoni Have you had a chance to see the previous response? If the suggestions were helpful, click “Accept Answer” and Up-Vote. Feel free to reach out to us if you've additional questions in this regard.
-
Francesco Landoni 20 Reputation points
2023-11-14T12:23:48.4633333+00:00 Thanks for the reply.
The state of the art situation is that the nodes have an average CPU consumption < 30% but the requests to instantiate the update pod related to the requests of the existing pods did not allow it to work.
All requests that make it impossible to complete the task are system-wide (Azure) and not linked to the applications that are deployed
The size chosen for the infrastructure is E2sv3 (core 4) with 2 nodes with 2 cores each.
The solution to resolve the deadlock was to create a new node with 4 cores and, once the update was finished, I removed the created node.
The pod had a run time of a few seconds.
Once the update was complete it was possible to create a backup manually.
I subsequently disabled automatic updating to avoid this situation recurring in the future.
In fact the imposed request of ~500m for the dataprotection-microsoft-kubernetes-agent-upgrade-crds pod makes it impossible to use automatic upgrade for the E2sv3 family.
-
Rajat Shrivastava 0 Reputation points • Microsoft Employee
2023-11-14T13:35:33.4833333+00:00 Thanks @Francesco Landoni to bring this into our notice. Let me take this back with my team on what we can do with that regards.
-
Anshul Ahuja 5 Reputation points • Microsoft Employee
2023-11-15T04:06:32.1+00:00 Hi @Francesco Landoni
I am from the product team.
We have reduced the CPU requirement of the Job to 50 CPU now. And we have done other significant improvements to fix the upgrade issues.
I would suggest you upgrade to latest extension version for which we recently completed release. (0.0.2496-177)I would further recommend that you to set your extension back to auto upgrade to get timely improvement of any bug fixes. Given that these upgrade worries should be resolved now.
-
Francesco Landoni 20 Reputation points
2023-11-15T07:35:43.4566667+00:00 Hi @Anshul Ahuja ,
Thanks for the reply.
What should I update to version 0.0.2496-177?
Is it possible to do it manually before reactivating automatic updating?
Thanks in advance and best regards.
-
Anshul Ahuja 5 Reputation points • Microsoft Employee
2023-11-15T10:03:25.5433333+00:00 Without getting into the complications of setting versions explicitly, what I'd suggest is - even for manual upgrade basically you can turn on auto upgrade, it should update to latest version in 30mins/ 1 hour or so.
Post that if you are seeing any issues you can disable auto upgrade and reach out here, or leave it as is if you are happy with current behavior. -
Francesco Landoni 20 Reputation points
2023-11-15T11:18:11.64+00:00 -
Francesco Landoni 20 Reputation points
2023-11-15T11:20:38.43+00:00 -
SadiqhAhmed-MSFT 41,716 Reputation points • Microsoft Employee
2023-11-20T06:26:23.9633333+00:00 @Francesco Landoni Any update on this?
-
Francesco Landoni 20 Reputation points
2023-11-20T14:54:03.8133333+00:00 Unfortunately I don't have any updates at the moment.
I will try to arrange a time for internal sharing at the end of this week.
-
Francesco Landoni 20 Reputation points
2023-11-23T15:51:06.78+00:00 Next week we will try the automatic update rollback.
I would also like to point out that we noticed the malfunction due to a manual check of the infrastructure and we never received any error communications in this regard.
-
SadiqhAhmed-MSFT 41,716 Reputation points • Microsoft Employee
2023-11-24T07:33:37.55+00:00 @Francesco Landoni Keep us posted on the progress.
-
SadiqhAhmed-MSFT 41,716 Reputation points • Microsoft Employee
2023-12-04T09:18:41.2233333+00:00 @Francesco Landoni We haven't heard back from you yet. Please let us know if you have any update to share.
-
Francesco Landoni 20 Reputation points
2023-12-04T09:53:23.61+00:00 Unfortunately we encountered further problems when restoring the backup.
Until that phase is successfully concluded this activity will remain suspended.
Best regards.
-
SadiqhAhmed-MSFT 41,716 Reputation points • Microsoft Employee
2023-12-06T14:08:59.37+00:00 @Francesco Landoni Could you please elaborate on the exact problem you encountered?
-
SadiqhAhmed-MSFT 41,716 Reputation points • Microsoft Employee
2023-12-08T06:41:40.3766667+00:00 @Francesco Landoni We haven't heard back from you. Please reply with more details if you have any questions in this matter and we will gladly continue the discussion.
Sign in to comment