Monitoring SQL Backup Failures with Azure Operational Insights Search and Dashboards

Besides the SQL Server Assessment Intelligence Pack, you can also use Log Management functionality to collect the ‘Application’ event log from your SQL Servers and monitor activity from it with search.

For example, this is my simple recipe for finding SQL Backup Failures

Type=Event EventID=3041 EventLog=Application Source=MSSQLSERVER

That’s it. I can see in my workspace I had a few of those back in January:

Type=Event EventID=3041 EventLog=Application Source=MSSQLSERVER

 

I can slightly massage the output to get better readability by selecting only certain fields piping my results into the SELECT command

Type=Event EventID=3041 EventLog=Application Source=MSSQLSERVER | Select Computer,RenderedDescription

Type=Event EventID=3041 EventLog=Application Source=MSSQLSERVER | Select Computer,RenderedDescription

Now if you want to keep monitoring this, you can save it in your favorite searches and pin it to your Dashboard, so you can take a look, even on the go with the Mobile app!

In my case I am investigating an occurrence in the past, so I want to understand why the failure occurred, so the next logical step would be to look for the time when the backup failure occurred, by interacting with the time bars on the top left under ‘narrow your results’ – clicking on the bars and narrowing down till we see the few individual events and a narrow enough time period

Time Control Narrow your results

Time Control Narrow your results

and now I want to see WHAT ELSE was happening on the same machine, so I add a filter for the ‘Computer’ field, and I remove everything else from my search:

Computer="BaconSCOM.BaconLand.com"

This shows me WHAT ELSE I know about that machine around that time frame, and I see, among other things that are always found (i.e. events, securityevents don’t come at regular intervals, but all the times), I see there is ONE configuration change – interesting

Computer="BaconSCOM.BaconLand.com"

 

Let’s drill into that… and we see that someone had installed a Defragmentation tool on the server around the same time as the backup failures!

ConfigurationChange

I think we now understand why the backup failed that day…Operational Insights made it super easy to get to the root cause!