Best practices for Azure Monitor Logs

This article provides architectural best practices for Azure Monitor Logs. The guidance is based on the five pillars of architecture excellence described in Azure Well-Architected Framework.

Reliability

Reliability refers to the ability of a system to recover from failures and continue to function. Instead of trying to prevent failures altogether in the cloud, the goal is to minimize the effects of a single failing component. Use the following information to minimize failure of your Log Analytics workspaces and to protect the data they collect.

Log Analytics workspaces offer a high degree of reliability. Conditions where a temporary loss of access to the workspace can result in data loss are often mitigated by features such as data buffering with the Azure Monitor Agent and protection mechanisms built into the ingestion pipeline.

The resiliency features described in this section can provide additional protection from data loss and business continuity. Some are in-region solutions, and others provide cross-regional redundancy; some are applied automatically and others require manual triggering. The table below summarizes and compares these features.

Some availability features require a dedicated cluster, which currently requires a commitment of at least 100 GB per day from all workspaces linked to this cluster (aggregated).

Design checklist

  • If you collect enough data for a dedicated cluster, create a dedicated cluster in an availability zone.
  • If you require the workspace to be available in the case of a region failure, or you don't collect enough data for a dedicated cluster, configure data collection to send critical data to multiple workspaces in different regions.
  • If you require data to be protected in the case of datacenter or region failure, configure data export from the workspace to save data in an alternate location.
  • For mission-critical workloads requiring high availability, consider implementing a federated workspace model.
  • Monitor the health of your Log Analytics workspaces.

Configuration recommendations

Recommendation Benefit
If you collect enough data, create a dedicated cluster in a region that supports availability zones. Workspaces linked to a dedicated cluster located in a region that supports availability zones remain available if a datacenter fails.

A dedicated cluster requires a commitment of at least 100 GB per day from all workspaces in the same region. If you don't collect this much data, then you need to weight the cost of this commitment with reliability features that it provides.
If you require data in your workspace to be available in the event of a region failure, send critical data to multiple workspaces in different regions. Send data to multiple workspaces in different regions. For example, configure DCRs to send data to multiple workspaces from Azure Monitor Agent running on virtual machines, and configure multiple diagnostic settings to collect resource logs from Azure resources to multiple workspaces.

Even though the data will be available in the alternate workspace in case of failure, resources that rely on the data, such as alerts and workbooks, won't know to use the alternate workspace. Consider storing ARM templates for critical resources with configuration for the alternate workspace in Azure DevOps or as disabled policies that can quickly be enabled in a failover scenario.

Tradeoff: This configuration results in duplicate ingestion and retention charges so only use it for critical data.
For mission-critical workloads requiring high availability, consider implementing a federated workspace model that uses multiple workspaces to provide high availability in the case of regional failure. Mission-critical provides prescriptive best practice guidance for architecting highly reliable applications on Azure. The design methodology includes a federated workspace model with multiple Log Analytics workspaces to deliver high availability in the case of multiple failures, including the failure of an Azure region.

This strategy eliminates egress costs across regions and remains operational with a region failure, but it requires additional complexity that you must manage with configuration and processes described in Health modeling and observability of mission-critical workloads on Azure.
If you require data to be protected in the case of datacenter or region failure, configure data export from the workspace to save data in an alternate location. The data export feature of Azure Monitor allows you to continuously export data sent to specific tables to Azure storage where it can be retained for extended periods. Use Azure Storage redundancy options, including GRS and GZRS, to replicate this data to other regions. If you require export of tables that aren't supported by data export, you can use other methods of exporting data, including Logic apps, to protect your data. This is primarily a solution to meet compliance for data retention since the data can be difficult to analyze and restore to the workspace.

This option is similar to the previous option of multicasting the data to different workspaces, but has a lower cost because the extra data is written to storage.

Data export is susceptible to regional incidents because it relies on the stability of the Azure Monitor ingestion pipeline in your region. It doesn't provide resiliency against incidents impacting the regional ingestion pipeline.
Monitor the health of your Log Analytics workspaces. Use Log Analytics workspace insights to track failed queries and create health status alert to proactively notify you if a workspace becomes unavailable because of a datacenter or regional failure.

Compare resilience features and capabilities

Feature Service resilience Data backup High availability Scope of protection Setup Cost
Availability zones
In supported regions
In-region Automatically enabled on dedicated clusters in supported regions. No cost
Continuous data export Protection from regional failure 1 Enable per table. Cost of data export + Storage blob or Event Hubs
Dual ingestion Protection from regional failure Enable per monitored resource. Up to twice the cost of retention (depending on how much data you dual ingest) + egress charges.

1 Data export provides cross-region protection if you export logs to a different region. In the event of an incident, previously exported data is backed up and readily available; however, further export might fail, depending on the nature of the incident.

Security

Security is one of the most important aspects of any architecture. Azure Monitor provides features to employ both the principle of least privilege and defense-in-depth. Use the following information to maximize the security of your Log Analytics workspaces and ensure that only authorized users access collected data.

Design checklist

  • Determine whether to combine your operational data and your security data in the same Log Analytics workspace.
  • Configure access for different types of data in the workspace required for different roles in your organization.
  • Consider using Azure private link to remove access to your workspace from public networks.
  • Use customer managed keys if you require your own encryption key to protect data and saved queries in your workspaces.
  • Export audit data for long term retention or immutability.
  • Configure log query auditing to track which users are running queries.
  • Determine a strategy to filter or obfuscate sensitive data in your workspace.
  • Purge sensitive data that was accidentally collected.
  • Enable Customer Lockbox for Microsoft Azure to approve or reject Microsoft data access requests.

Configuration recommendations

Recommendation Benefit
Determine whether to combine your operational data and your security data in the same Log Analytics workspace. Your decision whether to combine this data depends on your particular security requirements. Combining them in a single workspace gives you better visibility across all your data, although your security team might require a dedicated workspace. See Design a Log Analytics workspace strategy for details on making this decision for your environment balancing it with criteria in other pillars.

Tradeoff: There are potential cost implications to enabling Sentinel in your workspace. See details in Design a Log Analytics workspace architecture.
Configure access for different types of data in the workspace required for different roles in your organization. Set the access control mode for the workspace to Use resource or workspace permissions to allow resource owners to use resource-context to access their data without being granted explicit access to the workspace. This simplifies your workspace configuration and helps to ensure users will not be able to access data they shouldn't.

Assign the appropriate built-in role to grant workspace permissions to administrators at either the subscription, resource group, or workspace level depending on their scope of responsibilities.

Leverage table level RBAC for users who require access to a set of tables across multiple resources. Users with table permissions have access to all the data in the table regardless of their resource permissions.

See Manage access to Log Analytics workspaces for details on the different options for granting access to data in the workspace.
Consider using Azure private link to remove access to your workspace from public networks. Connections to public endpoints are secured with end-to-end encryption. If you require a private endpoint, you can use Azure private link to allow resources to connect to your Log Analytics workspace through authorized private networks. Private link can also be used to force workspace data ingestion through ExpressRoute or a VPN. See Design your Azure Private Link setup to determine the best network and DNS topology for your environment.
Use customer managed keys if you require your own encryption key to protect data and saved queries in your workspaces. Azure Monitor ensures that all data and saved queries are encrypted at rest using Microsoft-managed keys (MMK). If you require your own encryption key and collect enough data for a dedicated cluster, use customer-managed key for greater flexibility and key lifecycle control. If you use Microsoft Sentinel, then make sure that you're familiar with the considerations at Set up Microsoft Sentinel customer-managed key.
Export audit data for long term retention or immutability. You might have collected audit data in your workspace that's subject to regulations requiring its long term retention. Data in a Log Analytics workspace can’t be altered, but it can be purged. Use data export to send data to an Azure storage account with immutability policies to protect against data tampering. Not every type of logs has the same relevance for compliance, auditing, or security, so determine the specific data types that should be exported.
Configure log query auditing to track which users are running queries. Log query auditing records the details for each query that's run in a workspace. Treat this audit data as security data and secure the LAQueryLogs table appropriately. Configure the audit logs for each workspace to be sent to the local workspace, or consolidate in a dedicated security workspace if you separate your operational and security data. Use Log Analytics workspace insights to periodically review this data and consider creating log search alert rules to proactively notify you if unauthorized users are attempting to run queries.
Determine a strategy to filter or obfuscate sensitive data in your workspace. You might be collecting data that includes sensitive information. Filter records that shouldn't be collected using the configuration for the particular data source. Use a transformation if only particular columns in the data should be removed or obfuscated.

If you have standards that require the original data to be unmodified, then you can use the 'h' literal in KQL queries to obfuscate query results displayed in workbooks.
Purge sensitive data that was accidentally collected. Check periodically for private data that might have been accidentally collected in your workspace and use data purge to remove it.
Enable Customer Lockbox for Microsoft Azure to approve or reject Microsoft data access requests. Customer Lockbox for Microsoft Azure provides you with an interface to review and approve or reject customer data access requests. It's used in cases where a Microsoft engineer needs to access customer data, whether in response to a customer-initiated support ticket or a problem identified by Microsoft. To enable Customer Lockbox, you need a dedicated cluster.

Cost optimization

Cost optimization refers to ways to reduce unnecessary expenses and improve operational efficiencies. You can significantly reduce your cost for Azure Monitor by understanding your different configuration options and opportunities to reduce the amount of data that it collects. See Azure Monitor cost and usage to understand the different ways that Azure Monitor charges and how to view your monthly bill.

Note

See Optimize costs in Azure Monitor for cost optimization recommendations across all features of Azure Monitor.

Design checklist

  • Determine whether to combine your operational data and your security data in the same Log Analytics workspace.
  • Configure pricing tier for the amount of data that each Log Analytics workspace typically collects.
  • Configure data retention and archiving.
  • Configure tables used for debugging, troubleshooting, and auditing as Basic Logs.
  • Limit data collection from data sources for the workspace.
  • Regularly analyze collected data to identify trends and anomalies.
  • Create an alert when data collection is high.
  • Consider a daily cap as a preventative measure to ensure that you don't exceed a particular budget.
  • Set up alerts on Azure Advisor cost recommendations for Log Analytics workspaces.

Configuration recommendations

Recommendation Benefit
Determine whether to combine your operational data and your security data in the same Log Analytics workspace. Since all data in a Log Analytics workspace is subject to Microsoft Sentinel pricing if Sentinel is enabled, there might be cost implications to combining this data. See Design a Log Analytics workspace strategy for details on making this decision for your environment balancing it with criteria in other pillars.
Configure pricing tier for the amount of data that each Log Analytics workspace typically collects. By default, Log Analytics workspaces will use pay-as-you-go pricing with no minimum data volume. If you collect enough data, you can significantly decrease your cost by using a commitment tier, which allows you to commit to a daily minimum of data collected in exchange for a lower rate. If you collect enough data across workspaces in a single region, you can link them to a dedicated cluster and combine their collected volume using cluster pricing.

See Azure Monitor Logs cost calculations and options for details on commitment tiers and guidance on determining which is most appropriate for your level of usage. See Usage and estimated costs to view estimated costs for your usage at different pricing tiers.
Configure data retention and archiving. There's a charge for retaining data in a Log Analytics workspace beyond the default of 31 days (90 days if Sentinel is enabled on the workspace and 90 days for Application insights data). Consider your particular requirements for having data readily available for log queries. You can significantly reduce your cost by configuring Archived Logs, which allows you to retain data for up to seven years and still access it occasionally using search jobs or restoring a set of data to the workspace.
Configure tables used for debugging, troubleshooting, and auditing as Basic Logs. Tables in a Log Analytics workspace configured for Basic Logs have a lower ingestion cost in exchange for limited features and a charge for log queries. If you query these tables infrequently and don't use them for alerting, this query cost can be more than offset by the reduced ingestion cost.
Limit data collection from data sources for the workspace. The primary factor for the cost of Azure Monitor is the amount of data that you collect in your Log Analytics workspace, so you should ensure that you collect no more data that you require to assess the health and performance of your services and applications. See Design a Log Analytics workspace architecture for details on making this decision for your environment balancing it with criteria in other pillars.

Tradeoff: There might be a tradeoff between cost and your monitoring requirements. For example, you might be able to detect a performance issue more quickly with a high sample rate, but you might want a lower sample rate to save costs. Most environments have multiple data sources with different types of collection, so you need to balance your particular requirements with your cost targets for each. See Cost optimization in Azure Monitor for recommendations on configuring collection for different data sources.
Regularly analyze collected data to identify trends and anomalies. Use Log Analytics workspace insights to periodically review the amount of data collected in your workspace. In addition to helping you understand the amount of data collected by different sources, it will identify anomalies and upward trends in data collection that could result in excess cost. Further analyze data collection using methods in Analyze usage in Log Analytics workspace to determine if there's additional configuration that can decrease your usage further. This is particularly important when you add a new set of data sources, such as a new set of virtual machines or onboard a new service.
Create an alert when data collection is high. To avoid unexpected bills, you should be proactively notified anytime you experience excessive usage. Notification allows you to address any potential anomalies before the end of your billing period.
Consider a daily cap as a preventative measure to ensure that you don't exceed a particular budget. A daily cap disables data collection in a Log Analytics workspace for the rest of the day after your configured limit is reached. This shouldn't be used as a method to reduce costs as described in When to use a daily cap.

If you do set a daily cap, in addition to creating an alert when the cap is reached, ensure that you also create an alert rule to be notified when some percentage has been reached (90% for example). This gives you an opportunity to investigate and address the cause of the increased data before the cap shuts off data collection.
Set up alerts on Azure Advisor cost recommendations for Log Analytics workspaces. Azure Advisor recommendations for Log Analytics workspaces proactively alert you when there's an opportunity to optimize your costs. Create Azure Advisor alerts for these cost recommendations:
  • Consider configuring the cost effective Basic logs plan on selected tables - We've identified ingestion of more than 1 GB per month to tables that are eligible for the low cost Basic log data plan. The Basic log plan gives you search capabilities for debugging and troubleshooting at a lower cost.
  • Consider changing pricing tier- Based on your current usage volume, investigate changing your pricing (Commitment) tier to receive a discount and reduce costs.
  • Consider removing unused restored tables - You have one or more tables with restored data active in your workspace. If you're no longer using a restored data, delete the table to avoid unnecessary charges.
  • Data ingestion anomaly was detected - We've identified a much higher ingestion rate over the past week, based on your ingestion in the three previous weeks. Take note of this change and the expected change in your costs.
You can also view automatically generated recommendation by selecting Overview > Recommendations or Advisor recommendations from your Log Analytics workspace resource menu.

Operational excellence

Operational excellence refers to operations processes required keep a service running reliably in production. Use the following information to minimize the operational requirements for supporting Log Analytics workspaces.

Design checklist

  • Design a workspace architecture with the minimal number of workspaces to meet your business requirements.
  • Use Infrastructure as Code (IaC) when managing multiple workspaces.
  • Use Log Analytics workspace insights to track the health and performance of your Log Analytics workspaces.
  • Create alert rules to be proactively notified of operational issues in the workspace.
  • Ensure that you have a well-defined operational process for data segregation.

Configuration recommendations

Recommendation Benefit
Design a workspace strategy to meet your business requirements. See Design a Log Analytics workspace architecture for guidance on designing a strategy for your Log Analytics workspaces including how many to create and where to place them.

A single or at least minimal number of workspaces will maximize your operational efficiency since it limits the distribution of your operational and security data, increasing your visibility into potential issues, making patterns easier to identify, and minimizing your maintenance requirements.

You might have requirements for multiple workspaces such as multiple tenants, or you might need workspaces in multiple regions to support your availability requirements. In these cases, ensure that you have appropriate processes in place to manage this increased complexity.
Use Infrastructure as Code (IaC) when managing multiple workspaces. Use Infrastructure as Code (IaC) to define the details of your workspaces in ARM, BICEP, or Terraform. This allows you to you leverage your existing DevOps processes to deploy new workspaces and Azure Policy to enforce their configuration.
Use Log Analytics workspace insights to track the health and performance of your Log Analytics workspaces. Log Analytics workspace insights provides a unified view of the usage, performance, health, agents, queries, and change log for all your workspaces. Review this information on a regular basis to track the health and operation of each of your workspaces.
Create alert rules to be proactively notified of operational issues in the workspace. Each workspace has an operation table that logs important activities affecting workspace. Create alert rules based on this table to be proactively notified when an operational issue occurs. You can use recommended alerts for the workspace to simplify the creation of the most critical alert rules.
Ensure that you have a well-defined operational process for data segregation. You may have different requirements for different types of data stored in your workspace. Make sure that you clearly understand such requirements as data retention and security when designing your workspace strategy and configuring settings such as permissions and archiving. You should also have a clearly defined process for occasionally purging data with personal information that's accidentally collected.

Performance efficiency

Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. Use the following information to ensure that your Log Analytics workspaces and log queries are configured for maximum performance.

Design checklist

  • Configure log query auditing and use Log Analytics workspace insights to identify slow and inefficient queries.

Configuration recommendations

Recommendation Benefit
Configure log query auditing and use Log Analytics workspace insights to identify slow and inefficient queries. Log query auditing stores the compute time required to run each query and the time until results are returned. Log Analytics workspace insights uses this data to list potentially inefficient queries in your workspace. Consider rewriting these queries to improve their performance. Refer to Optimize log queries in Azure Monitor for guidance on optimizing your log queries.

Next step