Datafactory logs for dasboard on databricks

Hanna 220 Reputation points
2024-07-18T01:53:11.08+00:00

hello! need help! I need to solve a problem that consists of taking completed pipeline logs with specific names and inserting them into a deltatable within Databricks to create dashboards in it. this solution needs to be entirely using Azure tools. I'm having a lot of difficulty finding a scalable and low-cost way, because it's monitoring and requires low latency. Furthermore, the dashboard must be in Databricks, as deltas are already consumed, so I need to create a table there containing the information in real time, and that they are deleted after one week of entry. Could you help me with this? Any tips or strategies? Thank you from the bottom of my heart!

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,213 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 26,101 Reputation points
    2024-07-19T16:54:22.3733333+00:00

    I am splitting your problem into 4 parts :

    How to capture Logs from ADF?

    To capture logs from ADF, start by enabling diagnostic settings within your instance in the Azure portal. Navigate to the "Diagnostic settings" under the "Monitoring" section and add a new diagnostic setting. There, you can choose to stream logs either to a Log Analytics workspace or an Azure Storage Account.

    Stream Logs to Azure Event Hub :

    Next, create an Azure Event Hub to serve as the log ingestion point. Establish an Event Hub namespace and create an Event Hub within this namespace. Following this, configure your ADF diagnostic settings to stream the captured logs directly to this Event Hub. This configuration will allow for real-time log streaming from ADF to the Event Hub, facilitating immediate log processing and analysis.

    Process Logs in Azure Databricks :

    In this step, set up an Azure Databricks workspace if you don't already have one. Install the azure-eventhubs-spark library in your Databricks cluster to enable connectivity with Azure Event Hub. Using PySpark, read the event data from the Event Hub into Databricks. Define a schema that matches the structure of your logs and parse the incoming JSON payloads accordingly. Stream the parsed logs into a Delta table within Databricks for efficient storage and query capabilities.

    Create Dashboard in Databricks

    Create a Delta table named pipeline_logs in your Databricks workspace, where the processed logs will be stored. Using Databricks SQL, build a dashboard to visualize the logs. Write SQL queries to extract and display the necessary metrics, such as the number of completed pipelines and their respective execution times. For instance, you can write a query to list pipeline names, their run counts, and the last run times, filtered for completed status.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.