Ingesting Files From different folders of onprem and send the information to a databricks python notebook

Parsa Bahrami 20 Reputation points
2023-06-11T16:58:04.31+00:00

Hello everyone!

So I have files in 10 different folders under one parent folder in an on-prem environment. I need to setup a file arrival triggers which will trigger whenever a file arrives and I need it to somehow run and trigger a python databricks notebook. I don't want to use the File Arrival Trigger in the Databricks. I already have a solution which is to use 10 different azure logic apps and trigger an ADF with the python notebook whenever a file arrives in any of those specified folders. I was wondering if there is any simpler solution using ADF, event grid, Logic apps or Service Bus Queue.

Thanks

Azure Logic Apps
Azure Logic Apps
An Azure service that automates the access and use of data across clouds without writing code.
3,566 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,555 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,672 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vahid Ghafarpour 23,385 Reputation points Volunteer Moderator
    2023-06-11T17:15:13.5466667+00:00

    What about using Azure Data Factory (ADF), Event Grid, Logic Apps, or Service Bus Queue that can simplify the process of triggering a Python Databricks notebook when a file arrives in any of the specified folders? Here's an approach that combines ADF, Event Grid, and Logic Apps:

    Set up Event Grid subscription: Create an Event Grid subscription for the parent folder where the files are located. This subscription will listen for file arrival events.

    Create an Event Grid trigger in Logic Apps: Create a Logic App with an Event Grid trigger. Configure the trigger to listen to the events from the Event Grid subscription.

    Add a condition in Logic Apps: In the Logic App workflow, add a condition step to check if the file arrival event occurred in one of the specified folders. You can use expressions or functions to evaluate the file path and determine if it matches any of the desired folders.

    Trigger an ADF pipeline: If the condition in step 3 evaluates to true, use the "Azure Data Factory - Create a pipeline run" action in Logic Apps to trigger an ADF pipeline. Pass the necessary parameters, such as the file path, to the pipeline.

    Execute the Python Databricks notebook: In the triggered ADF pipeline, add an activity to execute the Python Databricks notebook. You can use the "Databricks Notebooks" activity in ADF to run the notebook and pass the required parameters.

    P.S. The above solution assumes you have connectivity between your on-premises environment and Azure. You must set up appropriate connectivity options like Azure ExpressRoute or Azure Virtual Network Gateway to establish a secure connection.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.