Best practices for Azure Monitor Logs
This article provides architectural best practices for Azure Monitor Logs. The guidance is based on the five pillars of architecture excellence described in Azure Well-Architected Framework.
In the cloud, we acknowledge that failures happen. Instead of trying to prevent failures altogether, the goal is to minimize the effects of a single failing component. Use the following information to minimize failure of your Log Analytics workspaces and to protect the data they collect.
Log Analytics workspaces offer a high degree of reliability without any design decisions. Conditions where a temporary loss of access to the workspace can result in data loss are often mitigated by features of other Azure Monitor components such as data buffering with the Azure Monitor agent.
The reliability situations to consider for Log Analytics workspaces are availability of the workspace and protection of collected data in the rare case of failure of an Azure datacenter or region. There is currently no standard feature for failover between workspaces in different regions, but there are strategies that you can use if you have particular requirements for availability or compliance.
Some availability features require a dedicated cluster. Since this requires a commitment of at least 500 GB per day from all workspaces in the same region, reliability will not typically be your primary criteria for including dedicated clusters in your design.
- If you collect enough data for a dedicated cluster, create a dedicated cluster in an availability zone.
- If you require the workspace to be available in the case of a region failure, or you don't collect enough data for a dedicated cluster, configure data collection to send critical data to multiple workspaces in different regions.
- If you require data to be protected in the case of datacenter or region failure, configure data export from the workspace to save data in an alternate location.
- Create a health status alert rule for your Log Analytics workspace.
|If you collect enough data, create a dedicated cluster in a region that supports availability zones.||Workspaces linked to a dedicated cluster located in a region that supports availability zones remain available if a datacenter fails.|
|If you require the workspace to be available in the case of a region failure, or you don't collect enough data for a dedicated cluster, configure data collection to send critical data to multiple workspaces in different regions.||Configure your data sources to send to multiple workspaces in different regions. For example, configure DCRs for multiple workspaces for Azure Monitor agent running on virtual machines, and multiple diagnostic settings to collection resource logs from Azure resources. This configuration results in duplicate ingestion and retention charges so only use it for critical data.
Even though the data will be available in the alternate workspace in case of failure, resources that rely on the data such as alerts and workbooks wouldn't know to use this workspace. Consider storing ARM templates for critical resources with configuration for the alternate workspace in Azure DevOps or as disabled policies that can quickly be enabled in a failover scenario.
|If you require data to be protected in the case of datacenter or region failure, configure data export from the workspace to save data in an alternate location.||The data export feature of Azure Monitor allows you to continuously export data sent to specific tables to Azure storage where it can be retained for extended periods. Use Azure Storage redundancy options including GRS and GZRS to replicate this data to other regions. If you require export of tables that aren't supported by data export then you can use other methods of exporting data including Logic apps to protect your data. This is primarily a solution to meet compliance for data retention since the data can be difficult to analyze and restore back to the workspace.|
|Create a health status alert rule for your Log Analytics workspace.||A health status alert will proactively notify you if a workspace becomes unavailable because of a datacenter or regional failure.|
Security is one of the most important aspects of any architecture. Azure Monitor provides features to employ both the principle of least privilege and defense-in-depth. Use the following information to maximize the security of your Log Analytics workspaces and ensure that only authorized users access collected data.
- Determine whether to combine your operational data and your security data in the same Log Analytics workspace.
- Configure access for different types of data in the workspace required for different roles in your organization.
- Consider using Azure private link to remove access to your workspace from public networks.
- Use customer managed keys if you require your own encryption key to protect data and saved queries in your workspaces.
- Export audit data for long term retention or immutability.
- Configure log query auditing to track which users are running queries.
- Determine a strategy to filter or obfuscate sensitive data in your workspace.
- Purge sensitive data that was accidentally collected.
|Determine whether to combine your operational data and your security data in the same Log Analytics workspace.||Your decision whether to combine this data depends on your particular security requirements. Combining them in a single workspace gives you better visibility across all your data, although your security team may require a dedicated workspace. See Design a Log Analytics workspace architecture for details on making this decision for your environment balancing it with criteria in other pillars.|
|Configure access for different types of data in the workspace required for different roles in your organization.||Set the access control mode for the workspace to Use resource or workspace permissions to allow resource owners to use resource-context to access their data without being granted explicit access to the workspace. This simplifies your workspace configuration and helps to ensure users will not be able to access data they shouldn't.
Assign the appropriate built-in role to grant workspace permissions to administrators at either the subscription, resource group, or workspace level depending on their scope of responsibilities.
Leverage table level RBAC for users who require access to a set of tables across multiple resources. Users with table permissions have access to all the data in the table regardless of their resource permissions.
See Manage access to Log Analytics workspaces for details on the different options for granting access to data in the workspace.
|Consider using Azure private link to remove access to your workspace from public networks.||Connections to public endpoints are secured with end-to-end encryption. If you require a private endpoint, you can use Azure private link to allow resources to connect to your Log Analytics workspace through authorized private networks. Private link can also be used to force workspace data ingestion through ExpressRoute or a VPN. See Design your Azure Private Link setup to determine the best network and DNS topology for your environment.|
|Use customer managed keys if you require your own encryption key to protect data and saved queries in your workspaces.||Azure Monitor ensures that all data and saved queries are encrypted at rest using Microsoft-managed keys (MMK). If you require your own encryption key and collect enough data for a dedicated cluster, use customer-managed key for greater flexibility and key lifecycle control. If you use Microsoft Sentinel, then make sure that you're familiar with the considerations at Set up Microsoft Sentinel customer-managed key.|
|Export audit data for long term retention or immutability.||You may have collected audit data in your workspace that's subject to regulations requiring its long term retention. Data in a Log Analytics workspace can’t be altered, but it can be purged. Use data export to send data to an Azure storage account with immutability policies to protect against data tampering. Not every type of logs has the same relevance for compliance, auditing, or security, so determine the specific data types that should be exported.|
|Configure log query auditing to track which users are running queries.||Log query auditing records the details for each query that's run in a workspace. Treat this audit data as security data and secure the LAQueryLogs table appropriately. Configure the audit logs for each workspace to be sent to the local workspace, or consolidate in a dedicated security workspace if you separate your operational and security data. Use Log Analytics workspace insights to periodically review this data and consider creating log query alert rules to proactively notify you if unauthorized users are attempting to run queries.|
|Determine a strategy to filter or obfuscate sensitive data in your workspace.||You may be collecting data that includes sensitive information. Filter records that shouldn't be collected using the configuration for the particular data source. Use a transformation if only particular columns in the data should be removed or obfuscated.
If you have standards that require the original data to be unmodified, then you can use the 'h' literal in KQL queries to obfuscate query results displayed in workbooks.
|Purge sensitive data that was accidentally collected.||Check periodically for private data that may have been accidentally collected in your workspace and use data purge to remove it.|
Cost optimization refers to ways to reduce unnecessary expenses and improve operational efficiencies. You can significantly reduce your cost for Azure Monitor by understanding your different configuration options and opportunities to reduce the amount of data that it collects. See Azure Monitor cost and usage to understand the different ways that Azure Monitor charges and how to view your monthly bill.
See Optimize costs in Azure Monitor for cost optimization recommendations across all features of Azure Monitor.
- Configure pricing tier for the amount of data that each Log Analytics workspace typically collects.
- Configure tables used for debugging, troubleshooting, and auditing as Basic Logs.
- Configure data retention and archiving.
- Regularly analyze collected data to identify trends and anomalies.
- Create an alert when data collection is high.
- Consider a daily cap as a preventative measure to ensure that you don't exceed a particular budget.
|Determine whether to combine your operational data and your security data in the same Log Analytics workspace.||Since all data in a Log Analytics workspace is subject to Microsoft Sentinel pricing if Sentinel is enabled, there may be cost implications to combining this data. See Design a Log Analytics workspace architecture for details on making this decision for your environment balancing it with criteria in other pillars.|
|Configure pricing tier for the amount of data that each Log Analytics workspace typically collects.||By default, Log Analytics workspaces will use pay-as-you-go pricing with no minimum data volume. If you collect enough data, you can significantly decrease your cost by using a commitment tier, which allows you to commit to a daily minimum of data collected in exchange for a lower rate. If you collect enough data across workspaces in a single region, you can link them to a dedicated cluster and combine their collected volume using cluster pricing.
See Azure Monitor Logs cost calculations and options for details on commitment tiers and guidance on determining which is most appropriate for your level of usage. See Usage and estimated costs to view estimated costs for your usage at different pricing tiers.
|Configure data retention and archiving.||There is a charge for retaining data in a Log Analytics workspace beyond the default of 31 days (90 days if Sentinel is enabled on the workspace and 90 days for Application insights data). Consider your particular requirements for having data readily available for log queries. You can significantly reduce your cost by configuring Archived Logs, which allows you to retain data for up to seven years and still access it occasionally using search jobs or restoring a set of data to the workspace.|
|Configure tables used for debugging, troubleshooting, and auditing as Basic Logs.||Tables in a Log Analytics workspace configured for Basic Logs have a lower ingestion cost in exchange for limited features and a charge for log queries. If you query these tables infrequently and don't use them for alerting, this query cost can be more than offset by the reduced ingestion cost.|
|Regularly analyze collected data to identify trends and anomalies.||Use Log Analytics workspace insights to periodically review the amount of data collected in your workspace. In addition to helping you understand the amount of data collected by different sources, it will identify anomalies and upward trends in data collection that could result in excess cost. Further analyze data collection using methods in Analyze usage in Log Analytics workspace to determine if there's additional configuration that can decrease your usage further. This is particularly important when you add a new set of data sources, such as a new set of virtual machines or onboard a new service.|
|Create an alert when data collection is high.||To avoid unexpected bills, you should be proactively notified anytime you experience excessive usage. Notification allows you to address any potential anomalies before the end of your billing period.|
|Consider a daily cap as a preventative measure to ensure that you don't exceed a particular budget.||A daily cap disables data collection in a Log Analytics workspace for the rest of the day after your configured limit is reached. This shouldn't be used as a method to reduce costs as described in When to use a daily cap.
If you do set a daily cap, in addition to creating an alert when the cap is reached,ensure that you also create an alert rule to be notified when some percentage has been reached (90% for example). This gives you an opportunity to investigate and address the cause of the increased data before the cap shuts off data collection.
Operational excellence refers to operations processes required keep a service running reliably in production. Use the following information to minimize the operational requirements for supporting Log Analytics workspaces.
- Design a workspace architecture with the minimal number of workspaces to meet your business requirements.
- Use Log Analytics workspace insights to track the health and performance of your Log Analytics workspaces.
- Create alert rules to be proactively notified of operational issues in the workspace.
|Design a workspace architecture with the minimal number of workspaces to meet your business requirements.||A single or at least minimal number of workspaces will maximize your operational efficiency since all of your operational and security data will be located in a single location increasing your visibility into potential issues and making patterns easier to identify. You minimize your requirement for cross-workspace queries and need to manage the configuration and data for fewer workspaces.
You may have requirements for multiple workspaces, such as multiple tenants or regulatory compliance requiring multiple regions, but you should balance those requirements against a goal of creating the minimal number of workspaces. See Design a Log Analytics workspace architecture for a list of decision criteria.
|Use Log Analytics workspace insights to track the health and performance of your Log Analytics workspaces.||Log Analytics workspace insights provides a unified view of the usage, performance, health, agents, queries, and change log for all your workspaces. Review this information on a regular basis to track the health and operation of each of your workspaces.|
|Create alert rules to be proactively notified of operational issues in the workspace.||Each workspace has an operation table that logs important activities affecting workspace. Create alert rules based on this table to be proactively notified when an operational issue occurs. You can use recommended alerts for the workspace to simplify the creation of the most critical alert rules.|
Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. Use the following information to ensure that your Log Analytics workspaces and log queries are configured for maximum performance.
- Configure log query auditing and use Log Analytics workspace insights to identify slow and inefficient queries.
|Configure log query auditing and use Log Analytics workspace insights to identify slow and inefficient queries.||Log query auditing stores the compute time required to run each query and the time until results are returned. Log Analytics workspace insights uses this data to list potentially inefficient queries in your workspace. Consider rewriting these queries to improve their performance. Refer to Optimize log queries in Azure Monitor for guidance on optimizing your log queries.|