Cuir in eagar

Comhroinn trí


Tutorial: Analyze data in Azure Monitor Logs using a notebook

Notebooks are integrated environments that let you create and share documents with live code, equations, visualizations, and text. Integrating a notebook with a Log Analytics workspace lets you create a multi-step process that runs code in each step based on the results of the previous step. You can use such streamlined processes to build machine learning pipelines, advanced analysis tools, troubleshooting guides (TSGs) for support needs, and more.

Integrating a notebook with a Log Analytics workspace also lets you:

  • Run KQL queries and custom code in any language.
  • Introduce new analytics and visualization capabilities, such as new machine learning models, custom timelines, and process trees.
  • Integrate data sets outside of Azure Monitor Logs, such as an on-premises data sets.
  • Take advantage of increased service limits using the Query API limits compared to the Azure portal.

In this tutorial, you learn how to:

For an example of how to build a machine learning pipeline to analyze data in Azure Monitor Logs using a notebook, see this sample notebook: Detect anomalies in Azure Monitor Logs using machine learning techniques.

Prerequisites

For this tutorial, you need:

Tools and notebooks

In this tutorial, you use these tools:

Tool Description
Azure Monitor Query client library Lets you run read-only queries on data in Azure Monitor Logs.
Azure Identity client library Enables Azure SDK clients to authenticate with Microsoft Entra ID.
Azure Monitor Ingestion client library Lets you send custom logs to Azure Monitor using the Logs Ingestion API. Required to ingest analyzed data into a custom table in your Log Analytics workspace (optional)
Data collection rule, data collection endpoint, and a registered application Required to ingest analyzed data into a custom table in your Log Analytics workspace (optional)

Other query libraries you can use include:

  • Kqlmagic library lets you run KQL queries directly inside a notebook in the same way you run KQL queries from the Log Analytics tool.
  • MSTICPY library provides templated queries that invoke built-in KQL time series and machine learning capabilities, and provides advanced visualization tools and analyses of data in Log Analytics workspace.

Other Microsoft notebook experiences for advanced analysis include:

1. Integrate your Log Analytics workspace with your notebook

Set up your notebook to query your Log Analytics workspace:

  1. Install the Azure Monitor Query, Azure Identity and Azure Monitor Ingestion client libraries along with the Pandas data analysis library, Plotly visualization library:

    import sys
    
    !{sys.executable} -m pip install --upgrade azure-monitor-query azure-identity azure-monitor-ingestion
    
    !{sys.executable} -m pip install --upgrade pandas plotly 
    
  2. Set the LOGS_WORKSPACE_ID variable below to the ID of your Log Analytics workspace. The variable is currently set to use the Azure Monitor Demo workspace, which you can use to demo the notebook.

    LOGS_WORKSPACE_ID = "DEMO_WORKSPACE"
    
  3. Set up LogsQueryClient to authenticate and query Azure Monitor Logs.

    This code sets up LogsQueryClient to authenticate using DefaultAzureCredential:

    from azure.core.credentials import AzureKeyCredential
    from azure.core.pipeline.policies import AzureKeyCredentialPolicy
    from azure.identity import DefaultAzureCredential
    from azure.monitor.query import LogsQueryClient
    
    if LOGS_WORKSPACE_ID == "DEMO_WORKSPACE":
        credential = AzureKeyCredential("DEMO_KEY")
        authentication_policy = AzureKeyCredentialPolicy(name="X-Api-Key", credential=credential)
    else:
        credential = DefaultAzureCredential()
        authentication_policy = None
    
    logs_query_client = LogsQueryClient(credential, authentication_policy=authentication_policy)
    

    LogsQueryClient typically only supports authentication with Microsoft Entra token credentials. However, we can pass a custom authentication policy to enable the use of API keys. This allows the client to query the demo workspace. The availability and access to this demo workspace is subject to change, so we recommend using your own Log Analytics workspace.

  4. Define a helper function, called query_logs_workspace, to run a given query in the Log Analytics workspace and return the results as a Pandas DataFrame.

    import pandas as pd
    import plotly.express as px
    
    from azure.monitor.query import LogsQueryStatus
    from azure.core.exceptions import HttpResponseError
    
    
    def query_logs_workspace(query):
        try:
            response = logs_query_client.query_workspace(LOGS_WORKSPACE_ID, query, timespan=None)
            if response.status == LogsQueryStatus.PARTIAL:
                error = response.partial_error
                data = response.partial_data
                print(error.message)
            elif response.status == LogsQueryStatus.SUCCESS:
                data = response.tables
            for table in data:
                my_data = pd.DataFrame(data=table.rows, columns=table.columns)        
        except HttpResponseError as err:
            print("something fatal happened")
            print (err)
        return my_data
    

2. Explore and visualize data from your Log Analytics workspace in your notebook

Let's look at some data in the workspace by running a query from the notebook:

  1. This query checks how much data (in Megabytes) you ingested into each of the tables (data types) in your Log Analytics workspace each hour over the past week:

    TABLE = "Usage"
    
    QUERY = f"""
    let starttime = 7d; // Start date for the time series, counting back from the current date
    let endtime = 0d; // today 
    {TABLE} | project TimeGenerated, DataType, Quantity 
    | where TimeGenerated between (ago(starttime)..ago(endtime))
    | summarize ActualUsage=sum(Quantity) by TimeGenerated=bin(TimeGenerated, 1h), DataType
    """
    
    df = query_logs_workspace(QUERY)
    display(df)
    

    The resulting DataFrame shows the hourly ingestion in each of the tables in the Log Analytics workspace:

    Screenshot of a DataFrame generated in a notebook with log ingestion data retrieved from a Log Analytics workspace.

  2. Now, let's view the data as a graph that shows hourly usage for various data types over time, based on the Pandas DataFrame:

    df = df.sort_values(by="TimeGenerated")
    graph = px.line(df, x='TimeGenerated', y="ActualUsage", color='DataType', title="Usage in the last week - All data types")
    graph.show()
    

    The resulting graph looks like this:

    A graph that shows the amount of data ingested into each of the tables in a Log Analytics workspace over seven days.

    You've successfully queried and visualized log data from your Log Analytics workspace in your notebook.

3. Analyze data

As a simple example, let's take the first five rows:

analyzed_df = df.head(5)

For an example of how to implement machine learning techniques to analyze data in Azure Monitor Logs, see this sample notebook: Detect anomalies in Azure Monitor Logs using machine learning techniques.

4. Ingest analyzed data into a custom table in your Log Analytics workspace (optional)

Send your analysis results to a custom table in your Log Analytics workspace to trigger alerts or to make them available for further analysis.

  1. To send data to your Log Analytics workspace, you need a custom table, data collection endpoint, data collection rule, and a registered Microsoft Entra application with permission to use the data collection rule, as explained in Tutorial: Send data to Azure Monitor Logs with Logs ingestion API (Azure portal).

    When you create your custom table:

    1. Upload this sample file to define the table schema:

      [
        {
          "TimeGenerated": "2023-03-19T19:56:43.7447391Z",    
          "ActualUsage": 40.1,    
          "DataType": "AzureDiagnostics"     
        } 
      ]
      
  2. Define the constants you need for the Logs Ingestion API:

    os.environ['AZURE_TENANT_ID'] = "<Tenant ID>"; #ID of the tenant where the data collection endpoint resides
    os.environ['AZURE_CLIENT_ID'] = "<Application ID>"; #Application ID to which you granted permissions to your data collection rule
    os.environ['AZURE_CLIENT_SECRET'] = "<Client secret>"; #Secret created for the application
    
    
    
    os.environ['LOGS_DCR_STREAM_NAME'] = "<Custom stream name>" ##Name of the custom stream from the data collection rule
    os.environ['LOGS_DCR_RULE_ID'] = "<Data collection rule immutableId>" # immutableId of your data collection rule
    os.environ['DATA_COLLECTION_ENDPOINT'] =  "<Logs ingestion URL of your endpoint>" # URL that looks like this: https://xxxx.ingest.monitor.azure.com
    
  3. Ingest the data into the custom table in your Log Analytics workspace:

    from azure.core.exceptions import HttpResponseError
    from azure.identity import ClientSecretCredential
    from azure.monitor.ingestion import LogsIngestionClient
    import json
    
    
    credential = ClientSecretCredential(
        tenant_id=AZURE_TENANT_ID,
        client_id=AZURE_CLIENT_ID,
        client_secret=AZURE_CLIENT_SECRET
    )
    
    client = LogsIngestionClient(endpoint=DATA_COLLECTION_ENDPOINT, credential=credential, logging_enable=True)
    
    body = json.loads(analyzed_df.to_json(orient='records', date_format='iso'))
    
    try:
       response =  client.upload(rule_id=LOGS_DCR_RULE_ID, stream_name=LOGS_DCR_STREAM_NAME, logs=body)
       print("Upload request accepted")
    except HttpResponseError as e:
        print(f"Upload failed: {e}")
    

    Note

    When you create a table in your Log Analytics workspace, it can take up to 15 minutes for ingested data to appear in the table.

  4. Verify that the data now appears in your custom table.

    Screenshot that shows a query in Log Analytics on a custom table into which the analysis results from the notebook were ingested.

Next steps

Learn more about how to: