Monitor virtual machines with Azure Monitor: Collect data
This article is part of the guide Monitor virtual machines and their workloads in Azure Monitor. It describes how to configure collection of data once you've deployed the Azure Monitor agent to your Azure and hybrid virtual machines in Azure Monitor.
This article provides guidance on collecting the most common types of telemetry from virtual machines. The exact configuration that you choose will depend on the workloads that you run on your machines. Included in each section are sample log query alerts that you can use with that data.
- See Monitor virtual machines with Azure Monitor: Analyze monitoring data for more information about analyzing telemetry collected from your virtual machines.
- See Monitor virtual machines with Azure Monitor: Alerts for more information about using telemetry collected from your virtual machines to create alerts in Azure Monitor.
This scenario describes how to implement complete monitoring of your Azure and hybrid virtual machine environment. To get started monitoring your first Azure virtual machine, see Monitor Azure virtual machines.
Data collection rules
Data collection from the Azure Monitor agent is defined by one or more data collection rules (DCR) stored in your Azure subscription and are associated with your virtual machines.
For virtual machines, DCRs will define data such as events and performance counters to collect and specify the Log Analytics workspaces that data should be sent to. The DCR can also use transformations to filter out unwanted data and to add calculated columns. A single machine can be associated with multiple DCRs, and a single DCR can be associated with multiple machines. DCRs are delivered to any machines they're associated with where they're processed by the Azure Monitor agent.
View data collection rules
You can view the DCRs in your Azure subscription from Data Collection Rules in the Monitor menu in the Azure portal. DCRs support other data collection scenarios in Azure Monitor, so all of your DCRs won't necessarily be for virtual machines.
Create data collection rules
There are multiple methods to create data collection rules depending on the data collection scenario. In some cases, the Azure portal will walk you through the configuration while other scenarios will require you to edit the DCR directly. When you configure VM insights, it will create a preconfigured DCR for you automatically. The sections below identify common data to collect and how to configure data collection.
In some cases, you may need to edit an existing DCR to add functionality. For example, you may use the Azure portal to create a DCR that collects Windows or Syslog events. You then want to add a transformation to that DCR to filter out columns in the events that you don't want to collect.
As your environment matures and grows in complexity, you should implement a strategy for organizing your DCRs to assist in their management. See Best practices for data collection rule creation and management in Azure Monitor for guidance on different strategies.
Since your Azure Monitor cost is dependent on how much data you collect, you should ensure that you're not collecting any more than you need to meet your monitoring requirements. Your configuration will be a balance between your budget and how much insight you want into the operation of your virtual machines.
For strategies to reduce your Azure Monitor costs, see Cost optimization and Azure Monitor.
A typical virtual machine will generate between 1GB and 3GB of data per month, but this data size is highly dependent on the configuration of the machine itself, the workloads running on it, and the configuration of your data collection rules. Before you configure data collection across your entire virtual machine environment, you should begin collection on some representative machines to better predict your expected costs when deployed across your environment. Use log queries in Data volume by computer to determine the amount of billable data collected for each machine and adjust accordingly.
Each data source that you collect may have a different method for filtering out unwanted data. You can also use transformations to implement more granular filtering and also to filter data from columns that provide little value. For example, you might have a Windows event that's valuable for alerting, but it includes columns with redundant or excessive data. You can create a transformation that allows the event to be collected but removes this excessive data.
Default data collection
Azure Monitor will automatically perform the following data collection without requiring any additional configuration.
Platform metrics for Azure virtual machines include important host metrics such as CPU, network, and disk utilization. They can be viewed on the Overview page, analyzed with metrics explorer for the machine in the Azure portal and used for metric alerts.
The Activity log is collected automatically and includes the recent activity of the machine, such as any configuration changes and when it was stopped and started. You can view the platform metrics and Activity log collected for each virtual machine host in the Azure portal.
You can view the Activity log for an individual machine or for all resources in a subscription. You should create a diagnostic setting to send this data into the same Log Analytics workspace used by your Azure Monitor agent to analyze it with the other monitoring data collected for the virtual machine. There's no cost for ingestion or retention of Activity log data.
VM availability information in Azure Resource Graph
Azure Resource Graph is an Azure service that allows you to use the same KQL query language used in log queries to query your Azure resources at scale with complex filtering, grouping, and sorting by resource properties. You can use VM health annotations to Azure Resource Graph (ARG) for detailed failure attribution and downtime analysis.
See Monitor virtual machines with Azure Monitor: Analyze monitoring data for details on what data is collected and how to view it.
When you enable VM insights, then it will create a data collection rule, with the MSVMI- prefix that collects the following information. You can use this same DCR with other machines as opposed to creating a new one for each VM.
Common performance counters for the client operating system are sent to the InsightsMetrics table in the Log Analytics workspace. Counter names will be normalized to use the same common name regardless of the operating system type. See How to query logs from VM insights for a list of performance counters that are collected.
If you specified processes and dependencies to be collected, then the following tables are populated:
By default, VM insights will not enable collection of processes and dependencies to save data ingestion costs. This data is required for the map feature and will also deploy the dependency agent to the machine. Enable this collection if you want to use this feature.
Collect Windows and Syslog events
The operating system and applications in virtual machines will often write to the Windows Event Log or Syslog. You may create an alert as soon as a single event is found or wait for a series of matching events within a particular time window. You may also collect events for later analysis such as identifying particular trends over time, or for performing troubleshooting after a problem occurs.
See Collect events and performance counters from virtual machines with Azure Monitor Agent for guidance on creating a DCR to collect Windows and Syslog events. This will allow you to quickly create a DCR using the most common Windows event logs and Syslog facilities filtering by event level. For more granular filtering by criteria such as event ID, you can create a custom filter using XPath queries. You can further filter the collected data by editing the DCR to add a transformation.
Use the following guidance as a recommended starting point for event collection. Modify the DCR settings to filter unneeded events and add additional events depending on your requirements.
|Windows events||Collect at least Critical, Error, and Warning events for the System and Application logs to support alerting. Add Information events to analyze trends and support troubleshooting. Verbose events will rarely be useful and typically shouldn't be collected.|
|Syslog events||Collect at least LOG_WARNING events for each facility to support alerting. Add Information events to analyze trends and support troubleshooting. LOG_DEBUG events will rarely be useful and typically shouldn't be collected.|
Sample log queries - Windows events
||All Windows events.|
||All Windows events with severity of error.|
||Count of Windows events by source.|
||Count of Windows error events by source.|
Sample log queries - Syslog events
||All Syslog records with severity of error|
||Count of Syslog records by computer|
||Count of Syslog records by facility|
Collect performance counters
Performance data from the client can be sent to either Azure Monitor Metrics or Azure Monitor Logs, and you'll typically send them to both destinations. If you enabled VM insights, then a common set of performance counters is collected in Logs to support its performance charts. You can't modify this set of counters, but you can create additional DCRs to collect additional counters and send them to different destinations.
There are multiple reasons why you would want to create a DCR to collect guest performance:
- You aren't using VM insights, so client performance data isn't already being collected.
- Collect additional performance counters that aren't being collected by VM insights.
- Collect performance counters from other workloads running on your client.
- Send performance data to Azure Monitor Metrics where you can use them with metrics explorer and metrics alerts.
See Collect events and performance counters from virtual machines with Azure Monitor Agent for guidance on creating a DCR to collect performance counters. This will allow you to quickly create a DCR using the most common counters. For more granular filtering by criteria such as event ID, you can create a custom filter using XPath queries.
You may choose to combine performance and event collection in the same data collection rule.
|Metrics||Host metrics are automatically sent to Azure Monitor Metrics, and you can use a DCR to collect client metrics so they can be analyzed together with metrics explorer or used with metrics alerts. This data is stored for 93 days.|
|Logs||Performance data stored in Azure Monitor Logs can be stored for extended periods and can be analyzed along with your event data using log queries with Log Analytics or log query alerts. You can also corelate data using complex logic across multiple machines, regions, and subscriptions.
Performance data is sent to the following tables:
VM insights - InsightsMetrics
Other performance data - Perf
Sample log queries
The following samples use the
Perf table with custom performance data. For details on performance data collected by VM insights, see How to query logs from VM insights.
||All Performance data|
||All Performance data from a particular computer|
||All Performance data for a particular counter|
||Average CPU Utilization across all computers|
||Maximum CPU Utilization across all computers|
||Average Current Disk Queue length across all the instances of a given computer|
||95th Percentile of Disk Transfers/Sec across all computers|
||Hourly average of CPU usage across all computers|
||Hourly 70 percentile of every % percent counter for a particular computer|
||Hourly average, minimum, maximum, and 75-percentile CPU usage for a specific computer|
||All Performance data from the Database performance object for the master database from the named SQL Server instance INST2.|
Collect text logs
Some applications write events written to a text log stored on the virtual machine. Create a custom table and DCR to collect this data. You define the location of the text log, its detailed configuration, and the schema of the custom table. There's a cost for the ingestion and retention of this data in the workspace.
Sample log queries
The column names used here are for example only. The column names for your log will most likely be different.
||Count the number of events by code.|
||Create an alert rule on any error event.|
Collect IIS logs
IIS running on Windows machines writes logs to a text file. Configure IIS log collection using Collect IIS logs with Azure Monitor Agent. There's a cost for the ingestion and retention of this data in the workspace. Records from the IIS log are stored in the W3CIISLog table in the Log Analytics workspace. There's a cost for the ingestion and retention of this data in the workspace.
Sample log queries
||Count the IIS log entries by URL for the host www.contoso.com.|
||Review the total bytes received by each IIS machine.|
Monitor a service or daemon
To monitor the status of a Windows service or Linux daemon, enable the Change Tracking and Inventory solution in Azure Automation. Azure Monitor has no ability on its own to monitor the status of a service or daemon. There are some possible methods to use, such as looking for events in the Windows event log, but this method is unreliable. You can also look for the process associated with the service running on the machine from the VMProcess table populated by VM insights. This table only updates every hour, which isn't typically sufficient if you want to use this data for alerting.
The Change Tracking and Analysis solution is different from the Change Analysis feature in VM insights. This feature is in public preview and not yet included in this scenario.
For different options to enable the Change Tracking solution on your virtual machines, see Enable Change Tracking and Inventory. This solution includes methods to configure virtual machines at scale. You'll have to create an Azure Automation account to support the solution.
When you enable Change Tracking and Inventory, two new tables are created in your Log Analytics workspace. Use these tables for logs queries and log query alert rules.
|ConfigurationChange||Changes to in-guest configuration data|
|ConfigurationData||Last reported state for in-guest configuration data|
Sample log queries
List all services and daemons that have recently started.
ConfigurationChange | where ConfigChangeType == "Daemons" or ConfigChangeType == "WindowsServices" | where SvcState == "Running" | sort by Computer, SvcName
Alert when a specific service stops. Use this query in a log alert rule.
ConfigurationData | where SvcName == "W3SVC" | where SvcState == "Stopped" | where ConfigDataType == "WindowsServices" | where SvcStartupType == "Auto" | summarize AggregatedValue = count() by Computer, SvcName, SvcDisplayName, SvcState, bin(TimeGenerated, 15m)
Alert when one of a set of services stops. Use this query in a log alert rule.
let services = dynamic(["omskd","cshost","schedule","wuauserv","heathservice","efs","wsusservice","SrmSvc","CertSvc","wmsvc","vpxd","winmgmt","netman","smsexec","w3svc","sms_site_vss_writer","ccmexe","spooler","eventsystem","netlogon","kdc","ntds","lsmserv","gpsvc","dns","dfsr","dfs","dhcp","DNSCache","dmserver","messenger","w32time","plugplay","rpcss","lanmanserver","lmhosts","eventlog","lanmanworkstation","wnirm","mpssvc","dhcpserver","VSS","ClusSvc","MSExchangeTransport","MSExchangeIS"]); ConfigurationData | where ConfigDataType == "WindowsServices" | where SvcStartupType == "Auto" | where SvcName in (services) | where SvcState == "Stopped" | project TimeGenerated, Computer, SvcName, SvcDisplayName, SvcState | summarize AggregatedValue = count() by Computer, SvcName, SvcDisplayName, SvcState, bin(TimeGenerated, 15m)
Monitor a port
Port monitoring verifies that a machine is listening on a particular port. Two potential strategies for port monitoring are described here.
Dependency agent tables
If you're using VM insights with Processes and dependencies collection enabled, you can use VMConnection and VMBoundPort to analyze connections and ports on the machine. The VMBoundPort table is updated every minute with each process running on the computer and the port it's listening on. You can create a log query alert similar to the missing heartbeat alert to find processes that have stopped or to alert when the machine isn't listening on a particular port.
Review the count of ports open on your VMs, which is useful for assessing which VMs have configuration and security vulnerabilities.
VMBoundPort | where Ip != "127.0.0.1" | summarize by Computer, Machine, Port, Protocol | summarize OpenPorts=count() by Computer, Machine | order by OpenPorts desc
List the bound ports on your VMs, which is useful for assessing which VMs have configuration and security vulnerabilities.
VMBoundPort | distinct Computer, Port, ProcessName
Analyze network activity by port to determine how your application or service is configured.
VMBoundPort | where Ip != "127.0.0.1" | summarize BytesSent=sum(BytesSent), BytesReceived=sum(BytesReceived), LinksEstablished=sum(LinksEstablished), LinksTerminated=sum(LinksTerminated), arg_max(TimeGenerated, LinksLive) by Machine, Computer, ProcessName, Ip, Port, IsWildcardBind | project-away TimeGenerated | order by Machine, Computer, Port, Ip, ProcessName
Review bytes sent and received trends for your VMs.
VMConnection | summarize sum(BytesSent), sum(BytesReceived) by bin(TimeGenerated,1hr), Computer | order by Computer desc | render timechart
Use connection failures over time to determine if the failure rate is stable or changing.
VMConnection | where Computer == <replace this with a computer name, e.g. 'acme-demo'> | extend bythehour = datetime_part("hour", TimeGenerated) | project bythehour, LinksFailed | summarize failCount = count() by bythehour | sort by bythehour asc | render timechart
Link status trends to analyze the behavior and connection status of a machine.
VMConnection | where Computer == <replace this with a computer name, e.g. 'acme-demo'> | summarize dcount(LinksEstablished), dcount(LinksLive), dcount(LinksFailed), dcount(LinksTerminated) by bin(TimeGenerated, 1h) | render timechart
The Connection Monitor feature of Network Watcher is used to test connections to a port on a virtual machine. A test verifies that the machine is listening on the port and that it's accessible on the network. Connection Manager requires the Network Watcher extension on the client machine initiating the test. It doesn't need to be installed on the machine being tested. For details, see Tutorial - Monitor network communication using the Azure portal.
There's an extra cost for Connection Manager. For details, see Network Watcher pricing.
Run a process on a local machine
Monitoring of some workloads requires a local process. An example is a PowerShell script that runs on the local machine to connect to an application and collect or process data. You can use Hybrid Runbook Worker, which is part of Azure Automation, to run a local PowerShell script. There's no direct charge for Hybrid Runbook Worker, but there is a cost for each runbook that it uses.
The runbook can access any resources on the local machine to gather required data. It can't send data directly to Azure Monitor or create an alert. To create an alert, have the runbook write an entry to a custom log and then configure that log to be collected by Azure Monitor. Create a log query alert rule that fires on that log entry.