Rediģēt

Kopīgot, izmantojot


Best practices for Azure Monitor Logs

This article provides architectural best practices for Azure Monitor Logs. The guidance is based on the five pillars of architecture excellence described in Azure Well-Architected Framework.

Reliability

Reliability refers to the ability of a system to recover from failures and continue to function. The goal is to minimize the effects of a single failing component. Use the following information to minimize failure of your Log Analytics workspaces and to protect the data they collect.

Log Analytics workspaces offer a high degree of reliability. The ingestion pipeline, which sends collected data to the Log Analytics workspace, validates that the Log Analytics workspace successfully processes each log record before it removes the record from the pipe. If the ingestion pipeline isn’t available, the agents that send the data buffer and retry sending the logs for many hours.

Azure Monitor Logs features that enhance resilience

Azure Monitor Logs offers several features that enhance workspaces resilience to various types of issues. You can use these features individually or in combination, depending on your needs.

This video provides an overview of reliability and resilience options available for Log Analytics workspaces:

In-region protection using availability zones

Each Azure region that supports availability zones has a set of datacenters equipped with independent power, cooling, and networking infrastructure.

Azure Monitor Logs availability zones are redundant, which means that Microsoft spreads service requests and replicates data across different zones in supported regions. If an incident affects one zone, Microsoft uses a different availability zone in the region instead, automatically. You don't need to take any action because switching between zones is seamless.

In most regions, Azure Monitor Logs availability zones support data resilience, which means your stored data is protected against data loss related to zonal failures, but service operations might still be impacted by regional incidents. If the service is unable to run queries, you can't view the logs until the issue is resolved.

A subset of the availability zones that support data resilience also support service resilience, which means that Azure Monitor Logs service operations - for example, log ingestion, queries, and alerts - can continue in the event of a zone failure.

Availability zones protect against infrastructure-related incidents, such as storage failures. They don’t protect against application-level issues, such as faulty code deployments or certificate failures, which impact the entire region.

Backup of data from specific tables using continuous export

You can continuously export data sent to specific tables in your Log Analytics workspace to Azure storage accounts.

The storage account you export data to must be in the same region as your Log Analytics workspace. To protect and have access to your ingested logs, even if the workspace region is down, use a geo-redundant storage account, as explained in Configuration recommendations.

The export mechanism doesn’t provide protection from incidents impacting the ingestion pipeline or the export process itself.

Note

You can access data in a storage account from Azure Monitor Logs using the externaldata operator. However, the exported data is stored in five-minute blobs and analyzing data spanning multiple blobs can be cumbersome. Therefore, exporting data to a storage account is a good data backup mechanism, but having the backed up data in a storage account is not ideal if you need it for analysis in Azure Monitor Logs. You can query large volumes of blob data using Azure Data Explorer, Azure Data Factory, or any other storage access tool.

Cross-regional data protection and service resilience using workspace replication (preview)

Workspace replication (preview) is the most extensive resilience solution as it replicates the Log Analytics workspace and incoming logs to another region.

Workspace replication protects both your logs and the service operations, and allows you to continue monitoring your systems in the event of infrastructure or application-related region-wide incidents.

In contrast with availability zones, which Microsoft manages end-to-end, you need to monitor your primary workspace's health and decide when to switch over to the workspace in the secondary region and back.

Design checklist

  • To ensure service and data resilience to region-wide incidents, enable workspace replication.
  • To ensure in-region protection against datacenter failure, create your workspace in a region that supports availability zones.
  • For cross-regional backup of data in specific tables, use the continuous export feature to send data to a geo-replicated storage account.
  • Monitor the health of your Log Analytics workspaces.

Configuration recommendations

Recommendation Benefit
To ensure the greatest degree of resilience, enable workspace replication. Cross-regional resilience for workspace data and service operations.

Workspace replication (preview) ensures high availability by creating a secondary instance of your workspace in another region and ingesting your logs to both workspaces.

When needed, switch to your secondary workspace until the issues impacting your primary workspace are resolved. You can continue ingesting logs, querying data, using dashboards, alerts, and Sentinel in your secondary workspace. You also have access to logs ingested before the region switch.

This is a paid feature, so consider whether you want to replicate all of your incoming logs, or only some data streams.
If possible, create your workspace in a region that supports Azure Monitor service-resilience. In-region resilience of workspace data and service operations in the event of datacenter issues.

Availability zones that support service resilience also support data resilience. This means that even if an entire datacenter becomes unavailable, the redundancy between zones allows Azure Monitor service operations, like ingestion and querying, to continue to work, and your ingested logs to remain available.

Availability zones provide in-region protection, but don't protect against issues that impact the entire region.

For information about which regions support data resilience, see Enhance data and service resilience in Azure Monitor Logs with availability zones.
Create your workspace in a region that supports data resilience. In-region protection against loss of the logs in your workspace in the event of datacenter issues.

Creating your workspace in a region that supports data resilience means that even if the entire datacenter becomes unavailable, your ingested logs are safe.
If the service is unable to run queries, you can't view the logs until the issue is resolved.

For information about which regions support data resilience, see Enhance data and service resilience in Azure Monitor Logs with availability zones.
Configure data export from specific tables to a storage account that's replicated across regions. Maintain a backup copy of your log data in a different region.

The data export feature of Azure Monitor allows you to continuously export data sent to specific tables to Azure storage where it can be retained for extended periods. Use a geo-redundant storage (GRS) or geo-zone-redundant storage (GZRS) account to keep your data safe even if an entire region becomes unavailable. To make your data readable from the other regions, configure your storage account for read access to the secondary region. For more information, see Azure Storage redundancy on a secondary region and Azure Storage read access to data in the secondary region.

For tables that don't supported continuous data export, you can use other methods of exporting data, including Logic Apps, to protect your data. This is primarily a solution to meet compliance for data retention since the data can be difficult to analyze and restore to the workspace.

Data export is susceptible to regional incidents because it relies on the stability of the Azure Monitor ingestion pipeline in your region. It doesn't provide resiliency against incidents impacting the regional ingestion pipeline.
Monitor the health of your Log Analytics workspaces. Use Log Analytics workspace insights to track failed queries and create health status alert to proactively notify you if a workspace becomes unavailable because of a datacenter or regional failure.

Compare Azure Monitor Logs resilience features

Feature Service resilience Data backup High availability Scope of protection Setup Cost
Workspace replication Cross-region protection against region-wide incidents Enable replication of the workspace and related data collection rules. Switch between regions as needed. Based on the number of replicated GBs and region.
Availability zones
In supported regions
In-region protection against datacenter issues Automatically enabled in supported regions. No cost
Continuous data export Protection from data loss because of a regional failure 1 Enable per table. Cost of data export + Storage blob or Event Hubs

1 Data export provides cross-region protection if you export logs to a geo-replicated storage account. In the event of an incident, previously exported data is backed up and readily available; however, further export might fail, depending on the nature of the incident.

Security

Security is one of the most important aspects of any architecture. Azure Monitor provides features to employ both the principle of least privilege and defense-in-depth. Use the following information to maximize the security of your Log Analytics workspaces and ensure that only authorized users access collected data.

Design checklist

  • Determine whether to combine your operational data and your security data in the same Log Analytics workspace.
  • Configure access for different types of data in the workspace required for different roles in your organization.
  • Consider using Azure private link to remove access to your workspace from public networks.
  • Use customer managed keys if you require your own encryption key to protect data and saved queries in your workspaces.
  • Export audit data for long term retention or immutability.
  • Configure log query auditing to track which users are running queries.
  • Determine a strategy to filter or obfuscate sensitive data in your workspace.
  • Purge sensitive data that was accidentally collected.
  • Enable Customer Lockbox for Microsoft Azure to approve or reject Microsoft data access requests.

Configuration recommendations

Recommendation Benefit
Determine whether to combine your operational data and your security data in the same Log Analytics workspace. Your decision whether to combine this data depends on your particular security requirements. Combining them in a single workspace gives you better visibility across all your data, although your security team might require a dedicated workspace. See Design a Log Analytics workspace strategy for details on making this decision for your environment balancing it with criteria in other pillars.

Tradeoff: There are potential cost implications to enabling Sentinel in your workspace. See details in Design a Log Analytics workspace architecture.
Configure access for different types of data in the workspace required for different roles in your organization. Set the access control mode for the workspace to Use resource or workspace permissions to allow resource owners to use resource-context to access their data without being granted explicit access to the workspace. This simplifies your workspace configuration and helps to ensure users won't be able to access data they shouldn't.

Assign the appropriate built-in role to grant workspace permissions to administrators at either the subscription, resource group, or workspace level depending on their scope of responsibilities.

Leverage table level RBAC for users who require access to a set of tables across multiple resources. Users with table permissions have access to all the data in the table regardless of their resource permissions.

See Manage access to Log Analytics workspaces for details on the different options for granting access to data in the workspace.
Consider using Azure private link to remove access to your workspace from public networks. Connections to public endpoints are secured with end-to-end encryption. If you require a private endpoint, you can use Azure private link to allow resources to connect to your Log Analytics workspace through authorized private networks. Private link can also be used to force workspace data ingestion through ExpressRoute or a VPN. See Design your Azure Private Link setup to determine the best network and DNS topology for your environment.
Use customer managed keys if you require your own encryption key to protect data and saved queries in your workspaces. Azure Monitor ensures that all data and saved queries are encrypted at rest using Microsoft-managed keys (MMK). If you require your own encryption key and collect enough data for a dedicated cluster, use customer-managed key for greater flexibility and key lifecycle control. If you use Microsoft Sentinel, then make sure that you're familiar with the considerations at Set up Microsoft Sentinel customer-managed key.
Export audit data for long term retention or immutability. You might have collected audit data in your workspace that's subject to regulations requiring its long term retention. Data in a Log Analytics workspace can’t be altered, but it can be purged. Use data export to send data to an Azure storage account with immutability policies to protect against data tampering. Not every type of logs has the same relevance for compliance, auditing, or security, so determine the specific data types that should be exported.
Configure log query auditing to track which users are running queries. Log query auditing records the details for each query that's run in a workspace. Treat this audit data as security data and secure the LAQueryLogs table appropriately. Configure the audit logs for each workspace to be sent to the local workspace, or consolidate in a dedicated security workspace if you separate your operational and security data. Use Log Analytics workspace insights to periodically review this data and consider creating log search alert rules to proactively notify you if unauthorized users are attempting to run queries.
Determine a strategy to filter or obfuscate sensitive data in your workspace. You might be collecting data that includes sensitive information. Filter records that shouldn't be collected using the configuration for the particular data source. Use a transformation if only particular columns in the data should be removed or obfuscated.

If you have standards that require the original data to be unmodified, then you can use the 'h' literal in KQL queries to obfuscate query results displayed in workbooks.
Purge sensitive data that was accidentally collected. Check periodically for private data that might have been accidentally collected in your workspace and use data purge to remove it.
Enable Customer Lockbox for Microsoft Azure to approve or reject Microsoft data access requests. Customer Lockbox for Microsoft Azure provides you with an interface to review and approve or reject customer data access requests. It's used in cases where a Microsoft engineer needs to access customer data, whether in response to a customer-initiated support ticket or a problem identified by Microsoft. To enable Customer Lockbox, you need a dedicated cluster.
Lockbox can't currently be applied to tables with the Auxiliary plan.

Cost optimization

Cost optimization refers to ways to reduce unnecessary expenses and improve operational efficiencies. You can significantly reduce your cost for Azure Monitor by understanding your different configuration options and opportunities to reduce the amount of data that it collects. See Azure Monitor cost and usage to understand the different ways that Azure Monitor charges and how to view your monthly bill.

Note

See Optimize costs in Azure Monitor for cost optimization recommendations across all features of Azure Monitor.

Design checklist

  • Determine whether to combine your operational data and your security data in the same Log Analytics workspace.
  • Configure pricing tier for the amount of data that each Log Analytics workspace typically collects.
  • Configure data retention and archiving.
  • Configure tables used for debugging, troubleshooting, and auditing as Basic Logs.
  • Limit data collection from data sources for the workspace.
  • Regularly analyze collected data to identify trends and anomalies.
  • Create an alert when data collection is high.
  • Consider a daily cap as a preventative measure to ensure that you don't exceed a particular budget.
  • Set up alerts on Azure Advisor cost recommendations for Log Analytics workspaces.

Configuration recommendations

Recommendation Benefit
Determine whether to combine your operational data and your security data in the same Log Analytics workspace. Since all data in a Log Analytics workspace is subject to Microsoft Sentinel pricing if Sentinel is enabled, there might be cost implications to combining this data. See Design a Log Analytics workspace strategy for details on making this decision for your environment balancing it with criteria in other pillars.
Configure pricing tier for the amount of data that each Log Analytics workspace typically collects. By default, Log Analytics workspaces will use pay-as-you-go pricing with no minimum data volume. If you collect enough data, you can significantly decrease your cost by using a commitment tier, which allows you to commit to a daily minimum of data collected in exchange for a lower rate. If you collect enough data across workspaces in a single region, you can link them to a dedicated cluster and combine their collected volume using cluster pricing.

See Azure Monitor Logs cost calculations and options for details on commitment tiers and guidance on determining which is most appropriate for your level of usage. See Usage and estimated costs to view estimated costs for your usage at different pricing tiers.
Configure interactive and long-term data retention. There's a charge for retaining data in a Log Analytics workspace beyond the default of 31 days (90 days if Sentinel is enabled on the workspace and 90 days for Application insights data). Consider your particular requirements for having data readily available for log queries. You can significantly reduce your cost by configuring long-term retention, which allows you to retain data for up to twelve years and still access it occasionally using search jobs or restoring a set of data to the workspace.
Configure tables used for debugging, troubleshooting, and auditing as Basic Logs. Tables in a Log Analytics workspace configured for Basic Logs have a lower ingestion cost in exchange for limited features and a charge for log queries. If you query these tables infrequently and don't use them for alerting, this query cost can be more than offset by the reduced ingestion cost.
Limit data collection from data sources for the workspace. The primary factor for the cost of Azure Monitor is the amount of data that you collect in your Log Analytics workspace, so you should ensure that you collect no more data that you require to assess the health and performance of your services and applications. See Design a Log Analytics workspace architecture for details on making this decision for your environment balancing it with criteria in other pillars.

Tradeoff: There might be a tradeoff between cost and your monitoring requirements. For example, you might be able to detect a performance issue more quickly with a high sample rate, but you might want a lower sample rate to save costs. Most environments have multiple data sources with different types of collection, so you need to balance your particular requirements with your cost targets for each. See Cost optimization in Azure Monitor for recommendations on configuring collection for different data sources.
Regularly analyze collected data to identify trends and anomalies. Use Log Analytics workspace insights to periodically review the amount of data collected in your workspace. In addition to helping you understand the amount of data collected by different sources, it will identify anomalies and upward trends in data collection that could result in excess cost. Further analyze data collection using methods in Analyze usage in Log Analytics workspace to determine if there's additional configuration that can decrease your usage further. This is particularly important when you add a new set of data sources, such as a new set of virtual machines or onboard a new service.
Create an alert when data collection is high. To avoid unexpected bills, you should be proactively notified anytime you experience excessive usage. Notification allows you to address any potential anomalies before the end of your billing period.
Consider a daily cap as a preventative measure to ensure that you don't exceed a particular budget. A daily cap disables data collection in a Log Analytics workspace for the rest of the day after your configured limit is reached. This shouldn't be used as a method to reduce costs as described in When to use a daily cap.

If you do set a daily cap, in addition to creating an alert when the cap is reached, ensure that you also create an alert rule to be notified when some percentage has been reached (90% for example). This gives you an opportunity to investigate and address the cause of the increased data before the cap shuts off data collection.
Set up alerts on Azure Advisor cost recommendations for Log Analytics workspaces. Azure Advisor recommendations for Log Analytics workspaces proactively alert you when there's an opportunity to optimize your costs. Create Azure Advisor alerts for these cost recommendations:
  • Consider configuring the cost effective Basic logs plan on selected tables - We've identified ingestion of more than 1 GB per month to tables that are eligible for the low cost Basic log data plan. The Basic log plan gives you query capabilities for debugging and troubleshooting at a lower cost.
  • Consider changing pricing tier- Based on your current usage volume, investigate changing your pricing (Commitment) tier to receive a discount and reduce costs.
  • Consider removing unused restored tables - You have one or more tables with restored data active in your workspace. If you're no longer using a restored data, delete the table to avoid unnecessary charges.
  • Data ingestion anomaly was detected - We've identified a much higher ingestion rate over the past week, based on your ingestion in the three previous weeks. Take note of this change and the expected change in your costs.
You can also view automatically generated recommendation by selecting Overview > Recommendations or Advisor recommendations from your Log Analytics workspace resource menu.

Operational excellence

Operational excellence refers to operations processes required keep a service running reliably in production. Use the following information to minimize the operational requirements for supporting Log Analytics workspaces.

Design checklist

  • Design a workspace architecture with the minimal number of workspaces to meet your business requirements.
  • Use Infrastructure as Code (IaC) when managing multiple workspaces.
  • Use Log Analytics workspace insights to track the health and performance of your Log Analytics workspaces.
  • Create alert rules to be proactively notified of operational issues in the workspace.
  • Ensure that you have a well-defined operational process for data segregation.

Configuration recommendations

Recommendation Benefit
Design a workspace strategy to meet your business requirements. See Design a Log Analytics workspace architecture for guidance on designing a strategy for your Log Analytics workspaces including how many to create and where to place them.

A single or at least minimal number of workspaces will maximize your operational efficiency since it limits the distribution of your operational and security data, increasing your visibility into potential issues, making patterns easier to identify, and minimizing your maintenance requirements.

You might have requirements for multiple workspaces such as multiple tenants, or you might need workspaces in multiple regions to support your availability requirements. In these cases, ensure that you have appropriate processes in place to manage this increased complexity.
Use Infrastructure as Code (IaC) when managing multiple workspaces. Use Infrastructure as Code (IaC) to define the details of your workspaces in ARM, BICEP, or Terraform. This allows you to you leverage your existing DevOps processes to deploy new workspaces and Azure Policy to enforce their configuration.
Use Log Analytics workspace insights to track the health and performance of your Log Analytics workspaces. Log Analytics workspace insights provides a unified view of the usage, performance, health, agents, queries, and change log for all your workspaces. Review this information on a regular basis to track the health and operation of each of your workspaces.
Create alert rules to be proactively notified of operational issues in the workspace. Each workspace has an operation table that logs important activities affecting workspace. Create alert rules based on this table to be proactively notified when an operational issue occurs. You can use recommended alerts for the workspace to simplify the creation of the most critical alert rules.
Ensure that you have a well-defined operational process for data segregation. You may have different requirements for different types of data stored in your workspace. Make sure that you clearly understand such requirements as data retention and security when designing your workspace strategy and configuring settings such as permissions and long-term retention. You should also have a clearly defined process for occasionally purging data with personal information that's accidentally collected.

Performance efficiency

Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. Use the following information to ensure that your Log Analytics workspaces and log queries are configured for maximum performance.

Design checklist

  • Configure log query auditing and use Log Analytics workspace insights to identify slow and inefficient queries.

Configuration recommendations

Recommendation Benefit
Configure log query auditing and use Log Analytics workspace insights to identify slow and inefficient queries. Log query auditing stores the compute time required to run each query and the time until results are returned. Log Analytics workspace insights uses this data to list potentially inefficient queries in your workspace. Consider rewriting these queries to improve their performance. Refer to Optimize log queries in Azure Monitor for guidance on optimizing your log queries.

Next step