How can i monitor azure data factory pipelines runtime logs in realtime with no latency?

prashanth 1 Reputation point
2024-06-03T08:13:28.44+00:00

Hi team,

I'm trying to to retrieve Azure Data Factory pipeline execution logs in Realtime without any latency.

If tried using log analytics, I see there is a latency in logs to get available.

How can I get the Realtime pipeline details including IN progress, Queued as well.

Is there any approach I can use to get the streaming logs with no latency.

Thanks in Advance!!!

Azure Monitor
Azure Monitor
An Azure service that is used to collect, analyze, and act on telemetry data from Azure and on-premises environments.
2,939 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,841 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,943 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. innovation gadget 155 Reputation points
    2024-06-03T09:01:35.6366667+00:00

    Hello

    To achieve near real-time monitoring of Azure Data Factory (ADF) pipeline execution, including in-progress and queued states, with minimal latency, you can use a combination of Azure Data Factory's integration with Azure Event Grid and Azure Monitor. This approach allows you to capture pipeline run events as they occur and process them in real-time.

    Step-by-Step Approach

    1. Enable Diagnostic Settings with Event Grid:
    • Configure Azure Data Factory to send diagnostic logs to Azure Event Grid. Event Grid provides a mechanism to react to events in real-time.
    1. Create Event Subscriptions:
    • Set up an Event Subscription to route these events to a real-time processing service such as Azure Functions, Azure Logic Apps, or Azure Stream Analytics.
    1. Process Events in Real-Time:
    • Use the chosen service to process the events and store or forward the log details to a preferred real-time monitoring system or dashboard.

    Detailed Implementation

    Step 1: Enable Diagnostic Settings

    Navigate to your Data Factory:

    • In the Azure portal, go to your Data Factory instance.

    Configure Diagnostic Settings:

      - Under the Monitoring section, click on "Diagnostic settings".
      
         - Add a diagnostic setting and enable the "Send to Event Grid" option.
         
            - Select the relevant log categories, such as **`PipelineRun`**, **`ActivityRun`**, etc.
            
    

    Step 2: Create Event Subscription

    Navigate to Event Grid:

    • Go to the Event Grid service in the Azure portal.

    Create an Event Subscription:

      - Click on "+ Event Subscription" to create a new subscription.
      
         - Select the Azure Data Factory as the publisher.
         
            - Configure the event subscription to filter for the required events (**`PipelineRunStarted`**, **`PipelineRunFinished`**, etc.).
            
               - Set the endpoint type to the service that will process the events (e.g., Azure Function, Logic App).
               
    

    Step 3: Process Events in Real-Time

    Azure Functions (Example):

    • Create an Azure Function to process the events.
      • Configure the function to trigger on Event Grid events.
        • In your function code, process the event data and write to a real-time dashboard or a database for monitoring.
        Example Function Code (Python):
                 pythonCopy code
                 import
        
        Azure Logic Apps:
        - Create a Logic App with an Event Grid trigger.
        
           - Define actions to process and route the event data to your monitoring solution.
        
           **Azure Stream Analytics:**
        
              - Use Stream Analytics to process the event stream in real-time.
        
                 - Configure an input from Event Grid and define a query to process the pipeline events.
        
                    - Define outputs to real-time dashboards or databases.
        

    Additional Tips

    • Azure Monitor Workbooks: Combine the real-time processing with Azure Monitor Workbooks to create customizable dashboards for real-time monitoring.
    • Custom Alerts: Set up alerts based on specific pipeline statuses or conditions using Azure Monitor to get immediate notifications.
    • Latency Consideration: While Event Grid offers real-time event processing, there may still be minimal latency (usually in milliseconds) depending on the complexity of the event processing logic and the network conditions.

    By leveraging Azure Event Grid for real-time event processing and integrating it

    with services like Azure Functions, Logic Apps, or Stream Analytics, you can achieve near real-time monitoring of Azure Data Factory pipeline execution with minimal latency.

    Example Scenario: Real-Time Monitoring with Azure Event Grid and Azure Functions

    Let's walk through a concrete example of setting this up:

    Step 1: Enable Diagnostic Settings to Send Logs to Event Grid

    Navigate to Azure Data Factory in the Azure Portal:

    • Go to your Data Factory instance.
      • Under the "Monitoring" section, select "Diagnostic settings".
      Create Diagnostic Setting:
      - Click on "+ Add diagnostic setting".
      
         - Provide a name for the diagnostic setting.
      
            - Select "Send to Event Grid" and choose the log categories like **`PipelineRun`**, **`ActivityRun`**, etc.
      
               - Save the settings.
      

    Step 2: Create an Event Grid Subscription

    Go to Event Grid:

    • In the Azure Portal, navigate to "Event Grid".
      • Select "Event Subscriptions".
      Create an Event Subscription:
      - Click on "+ Event Subscription".
      
         - Choose your Azure Data Factory as the resource.
      
            - Set the endpoint type to "Azure Function".
      
               - Configure the event subscription to filter for the necessary events (**`PipelineRunStarted`**, **`PipelineRunQueued`**, **`PipelineRunFinished`**, etc.).
      

    Step 3: Create an Azure Function to Process Events

    Create an Azure Function App:

    • In the Azure Portal, navigate to "Function App".
      • Click on "+ Add" to create a new Function App.
        • Configure the Function App settings as needed and create it.
        Create a Function to Process Event Grid Events:
        - Once the Function App is created, go to the Functions section.
        
           - Click on "+ Add" to create a new function.
        
              - Choose the "Event Grid trigger" template.
        
              **Implement the Function Logic:**
        
                 - Use the template provided to write code that processes the events. For example, here's a Python function that logs event details:
        
                 ```python
                 pythonCopy code
                 import
                 ```
        
                 **Deploy and Test the Function:**
        
                    - Deploy the function and ensure it's configured correctly.
        
                       - Test the setup by triggering a pipeline run in Azure Data Factory and observing the logs in the Azure Function.
        

    Monitoring and Dashboards

    To visualize and monitor the real-time pipeline execution details:

    • Azure Monitor Workbooks: Create custom dashboards in Azure Monitor Workbooks to visualize the real-time data processed by your Azure Function.
    • Power BI: Stream the processed data to Power BI for real-time analytics and visualization.
    • Custom Web Application: Develop a custom web application that consumes and displays real-time pipeline status using WebSockets or other real-time data streaming technologies.

    Additional Tools and Resources

    • Azure Logic Apps: If you prefer a low-code solution, use Azure Logic Apps instead of Azure Functions to process Event Grid events.
    • Azure Stream Analytics: For more complex event processing and analytics, use Azure Stream Analytics with Event Grid input and output to various real-time data sinks.
    • Azure Data Explorer: Store and analyze large volumes of real-time data using Azure Data Explorer for fast querying and visualization.

    By setting up Azure Event Grid with Azure Functions (or other real-time processing services), you can effectively monitor Azure Data Factory pipeline executions in near real-time, addressing the need for timely insights into in-progress and queued pipeline runs


  2. Srinud 1,965 Reputation points Microsoft Vendor
    2024-06-05T14:40:06.45+00:00

    Hi prashanth,

    Thank you for reaching out to us on the Microsoft Q&A forum.

    To monitor Azure Data Factory (ADF) pipeline execution logs in real-time with minimal latency, including details of in-progress and queued pipelines, you need to implement a solution that captures and processes logs as they are generated. Here are a few approaches you can consider:

    1. Azure Event Grid and Azure Functions

    Azure Event Grid can be used to get real-time notifications of events happening in your Azure Data Factory. By subscribing to the ADF events and triggering Azure Functions, you can process and log these events in real-time.

    Steps to Implement:

    1. Create an Event Grid Topic: Create an Event Grid topic to which ADF will publish events.

    2. Configure ADF to Publish Events: Configure your Data Factory to publish events to the Event Grid topic. Events such as pipeline run status, activity run status, etc., can be configured.

    3. Subscribe to Events using Azure Functions: Create an Azure Function that is triggered by events published to the Event Grid topic. This function can process the events and write the logs to a real-time logging solution like Azure Cosmos DB, Azure Table Storage, or any other low-latency data store.

    Please refer to the documentation below for more details:
    Link1: https://learn.microsoft.com/en-us/azure/event-grid/overview

    Link2: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-event-grid?tabs=isolated-process%2Cextensionv3&pivots=programming-language-csharp

    1. Azure Stream Analytics and Power BI

    Azure Stream Analytics can process streaming data in real-time. By sending ADF logs to an Azure Event Hub and using Stream Analytics to process these logs, you can visualize the logs in real-time in Power BI.

    Steps to Implement:

    1. Send ADF Logs to Event Hub: Configure ADF to send logs to Azure Event Hub.

    2. Stream Processing with Azure Stream Analytics: Create a Stream Analytics job to read data from Event Hub, process it, and output it to Power BI for real-time visualization or another real-time data store.

    3. Real-Time Dashboard: Use Power BI to create a real-time dashboard for monitoring pipeline statuses.

    Please refer to the documentation below for more details:

    Link1: https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-about

    Link2: https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-introduction

    Link3: https://learn.microsoft.com/en-us/power-bi/connect-data/service-real-time-streaming

    1. Custom Real-Time Monitoring Solution

    If the above solutions do not meet your needs, you can create a custom monitoring solution that directly queries the Azure Data Factory's Management API to get the status of pipelines in real-time.

    Steps to Implement:

    1. Polling ADF Management API: Write a service that periodically polls the Azure Data Factory Management API for pipeline and activity run statuses.

    2. Real-Time Data Store: Store the fetched logs in a real-time data store such as Azure Redis Cache, Azure Cosmos DB, or a similar low-latency storage service.

    3. Real-Time Dashboard: Use a real-time dashboard solution like Power BI, Grafana, or a custom-built dashboard to visualize the data.

    Considerations

    · Latency: The solutions mentioned above aim to minimize latency, but some latency might still be inevitable due to network delays, processing time, and the nature of the services used.

    · Complexity: Real-time solutions tend to be more complex and require careful handling of events and state.

    · Scalability: Ensure that the solution is scalable to handle the load of log data generated by

    If you find the provided information helpful, we would greatly appreciate your consideration in clicking the "Accept Answer and Upvote" on the post.

    0 comments No comments