My goal is to reboot a vm whenever it has failed.
I previously tried the approach of creating an alert based on memory % (indeed, failures seem to be caused by a memory leak), via VM Insights. But that doesn't always work, as sometimes it seems some machines are failing without triggering the memory limit alert. This seems to be because they stop sending the monitoring data (probably because of the failure itself!).
Anyways, Monitor seems to be suggesting something perfectly tailored to my needs: Heartbeat, and the related queries.
I tried the following, grabbed from the Queries library:
// Not reporting VMs // VMs that have not reported a heartbeat in the last 5 minutes. // To create an alert for this query, click '+ New alert rule' Heartbeat | where TimeGenerated > ago(24h) | summarize LastCall = max(TimeGenerated) by Computer, _ResourceId | where LastCall < ago(5m)
Alas, when clicking on the New Alert Rule button, I see the following error message:
The query didn't return the TimeGenerated column. Please edit the query and include the TimeGenerated column.
Please see screenshots attached.
How to overcome this error and get an alert based on Heartbeat failure?
Any other suggestion for building a reliable "reboot on failure" alert are welcome.
The Monitor | Logs interface with the queries library:
The Query with the New Alert button:
The Create Alert pane that shows up after clicking the New Alert button, with the error message: