Hi @Tomasz Rozwadowski
Thanks for posting your question,
A few points that I hope help:
The B series SKU is indeed not ideal for AKS nodes as they might not have a non consistent behaviour in terms of resource availability, and so you can experience issues in your application that might be harder to replicate or troubleshoot later.
For system nodepools, the recommended is at least 4vCPUs so Standard_DS4_v2 is a common choice, but it will come down to the size of your cluster. 80% CPU usage is not ideal, it can easily spike up to full usage and leave processes with iowait issues. I would suggest looking to upgrade that nodepool at least in CPU count.
For the latter scenario, if you are not using it, I would consider VMs with ephemeral disk to make sure the IO is not the bottleneck and then re-evaluate the memory usage on the nodes to see if a higher SKU is needed. Keep in mind the differences in RSS and working set memory metrics, check the link for more information.
https://learn.microsoft.com/en-us/azure/virtual-machines/ephemeral-os-disks
You might also want to consider using Affinity and anti affinity rules to better distribute your resource intensive workloads, e.g. make sure that two pods for deployments that use a lot of memory are not scheduled/running in the same node.
Lastly, regarding that error log, I believe you are correct. That node-problem-detector-startup.sh is standard behaviour for AKS nodes when kubelet process is misbehaving. If you are running into high resource usage problems, that would be the most likely cause and after you adjust your nodes resources it should clear up. However keep in mind that to avoid that scenario you should set mem/cpu limits on your workloads to avoid it.
If you have issues with that, even after following those recommendations, I would suggest opening a support ticket with Microsoft for better investigation.