To achieve this scenario, you can use Storage Event Triggers in Azure Data Factory (ADF). These triggers allow you to respond to events on a storage account (such as file arrival or deletion in Azure Blob Storage) and trigger pipelines accordingly. Here’s how you can set it up:
- Register with Event Grid:
- Ensure your subscription is registered with the Event Grid resource provider.
- If you’re using this feature in Azure Synapse Analytics, also register your subscription with the Data Factory resource provider.
- Configure Network Rules:
- If your blob storage account resides behind a private endpoint and blocks public network access, configure network rules to allow communication from blob storage to Azure Event Grid.
- You can either grant storage access to trusted Azure services (like Event Grid) or configure private endpoints for Event Grid.
- Supported Storage Accounts:
- The Storage Event Trigger currently supports only Azure Data Lake Storage Gen2 and General-purpose version 2 storage accounts.
- If you’re working with SFTP Storage Events, specify the SFTP Data API under the filtering section.
- Due to an Azure Event Grid limitation, Azure Data Factory supports a maximum of 500 storage event triggers per storage account.
- Ensure that the Azure account used to log into the service and publish the storage event trigger has appropriate role-based access control (Azure RBAC) permissions on the storage account.
- Create a Storage Event Trigger:
- In your ADF pipeline, create a new Storage Event Trigger.
- Configure it to listen for events on your storage account (e.g., new files in Blob Storage).
- When an event occurs, the trigger will activate your pipeline.
- Custom Event Payload (Optional):
- If you need to parse custom data from the event payload and pass it to your pipeline, create pipeline parameters.
- Use the format
@triggerBody().event.data.keyName
to extract values from the event payload.
Remember that Azure Data Factory natively integrates with Azure Event Grid, making it a powerful solution for event-driven data pipelines. If you encounter any issues during setup, feel free to ask for further assistance!
For more detailed information, you can refer to the official documentation.
Hope this helps. Do let us know if you any further queries.