I am splitting your problem into 4 parts :
How to capture Logs from ADF?
To capture logs from ADF, start by enabling diagnostic settings within your instance in the Azure portal. Navigate to the "Diagnostic settings" under the "Monitoring" section and add a new diagnostic setting. There, you can choose to stream logs either to a Log Analytics workspace or an Azure Storage Account.
Stream Logs to Azure Event Hub :
Next, create an Azure Event Hub to serve as the log ingestion point. Establish an Event Hub namespace and create an Event Hub within this namespace. Following this, configure your ADF diagnostic settings to stream the captured logs directly to this Event Hub. This configuration will allow for real-time log streaming from ADF to the Event Hub, facilitating immediate log processing and analysis.
Process Logs in Azure Databricks :
In this step, set up an Azure Databricks workspace if you don't already have one. Install the azure-eventhubs-spark
library in your Databricks cluster to enable connectivity with Azure Event Hub. Using PySpark, read the event data from the Event Hub into Databricks. Define a schema that matches the structure of your logs and parse the incoming JSON payloads accordingly. Stream the parsed logs into a Delta table within Databricks for efficient storage and query capabilities.
Create Dashboard in Databricks
Create a Delta table named pipeline_logs
in your Databricks workspace, where the processed logs will be stored. Using Databricks SQL, build a dashboard to visualize the logs. Write SQL queries to extract and display the necessary metrics, such as the number of completed pipelines and their respective execution times. For instance, you can write a query to list pipeline names, their run counts, and the last run times, filtered for completed status.