Data Factory monitoring by inserting data table

Hanna 220 Reputation points
2024-07-21T22:18:21.9966667+00:00

Hello, I would like to know the best way to insert Datafactory activity logs into my Databricks delta table, so that I can use dashbosrd and create monitoring in Databricks itself , can you help me? I would like every 5 minutes for all activity logs in the data factory to be inserted into the Databricks delta table, that is, if 10 pipelines are completed, the logs of these 10 are inserted into the delta. Please note: no logs cannot be missing. I want a solution that is considered good practice, economical and efficient, can you help me with this?

Azure Monitor
Azure Monitor
An Azure service that is used to collect, analyze, and act on telemetry data from Azure and on-premises environments.
3,313 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,217 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,212 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,819 questions
0 comments No comments
{count} votes

Accepted answer
  1. Pinaki Ghatak 4,610 Reputation points Microsoft Employee
    2024-07-22T08:34:14.5033333+00:00

    Hello @Hanna

    One way to achieve this is by using Azure Stream Analytics to stream the Data Factory activity logs to Databricks Delta table. Here are the high-level steps to achieve this:

    1. Create a Delta table in Databricks to store the Data Factory activity logs.
    2. Create an Azure Stream Analytics job to stream the Data Factory activity logs to the Delta table.
    3. Configure the Stream Analytics job to run every 5 minutes.
    4. Verify that the logs are being inserted into the Delta table. Here is an example of how you can create a Delta table in Databricks:
    CREATE TABLE activity_logs ( eventTime timestamp, operationName string, status string, activityId string, pipelineName string, runId string, message string ) USING DELTA LOCATION '/mnt/delta/activity_logs';
    

    Once you have created the Delta table, you can use Azure Stream Analytics to stream the Data Factory activity logs to the table. Here is an example of how you can configure the Stream Analytics job:

    1. In the Azure portal, navigate to your Stream Analytics job.
    2. Click on Inputs and then click on Add.
    3. Select Data Factory as the input source and configure the input settings.
    4. Click on Outputs and then click on Add.
    5. Select Databricks Delta as the output sink and configure the output settings.
    6. Click on Query and then enter the following query:
    SELECT CAST(eventTime AS timestamp) AS eventTime, operationName, status, activityId, pipelineName, runId, message INTO activity_logs FROM DataFactoryInput
    
    1. Click on Save and then click on Start to start the Stream Analytics job. Once the job is running, it will stream the Data Factory activity logs to the Delta table every 5 minutes. You can then use Databricks to create a dashboard and monitor the activity logs in real-time.

    I hope that this response has addressed your query and helped you overcome your challenges. If so, please mark this response as Answered. This will not only acknowledge our efforts, but also assist other community members who may be looking for similar solutions.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.