Hi Domenico Fasano,
Thanks for reaching out to Microsoft Q&A.
To achieve automatic deletion of migrated blobs based on their original last modified date while considering large volumes of blobs, here’s a suggested approach using ADF, lifecycle management policies, and potentially azure functions for scalability.
Copying Blobs with Metadata
- You have already mentioned using adf to migrate blobs and capturing the '$$lastmodified' value in custom metadata for the copied blobs in the destination storage account. This will work as it preserves the original last modified date. The key is to make sure this metadata gets applied to the new blobs in the target storage account.
- In the ADF copy activity, ensure the '$$LASTMODIFIED' value is being captured and applied correctly as custom metadata. You can configure this in the copy activity under "Mapping" and store it with a custom name like 'OriginalLastModifiedDate'.
Implement Lifecycle Management in the Destination Account
- Azure Storage supports lifecycle management policies that can delete blobs based on conditions such as their last modified date or custom metadata values.
- Option A: If you can set the blob’s 'Last Modified' date to match the original modified date, you can directly apply a lifecycle management policy in the destination Storage Account to delete blobs after 'x' days of inactivity.
- Option B: If you are storing the original 'Last Modified' date in metadata (ex: 'OriginalLastModifiedDate'), you can create a lifecycle management policy that uses this metadata for deletion rules. Unfortunately, lifecycle management doesn't directly support metadata-based conditions yet, so you’ll need to rely on another method, like Azure Functions, to handle deletions based on metadata.
Azure Functions for Metadata-Based Deletion
If the lifecycle policy cannot handle your custom metadata-based expiration, you can leverage Azure Functions with a timer trigger to delete the blobs based on the 'OriginalLastModifiedDate'. Since your concern is with timeouts and large volumes, consider using durable functions to split the workload into smaller, scalable tasks.
Here’s how:
Durable Functions allow orchestration of long-running tasks and scale efficiently to process large datasets (millions of blobs).
The function can:
- List blobs from the destination Storage Account.
- Check each blob’s 'OriginalLastModifiedDate' metadata.
- Delete blobs that exceed the retention threshold.
Schedule this durable function to run at regular intervals (ex: daily) to delete blobs in batches, avoiding timeout issues.
Alternative: Logic Apps for Long-Running Operations
If you prefer a low-code option and wish to avoid writing custom function code, Azure Logic Apps can provide a workflow-based approach for deleting blobs based on metadata:
- Logic Apps can run indefinitely without timeout issues and can process blobs in chunks.
- Use a Logic App with a "List blobs" action, filter blobs by the 'OriginalLastModifiedDate' metadata, and delete blobs that meet the condition.
Scalable and Efficient Workflow
Here’s a summary of the steps:
- Migration with ADF: Ensure the '$$LASTMODIFIED' value is stored as metadata ('OriginalLastModifiedDate') during the copy operation.
- Azure Storage Lifecycle Policy: If possible, apply a lifecycle policy based on the modified date (if it fits your requirement).
- Durable Functions/Logic Apps: If custom metadata is required for deletion logic, set up a Durable Function or Logic App to check the 'OriginalLastModifiedDate' and delete expired blobs in scalable batches.
Additional Considerations
Monitoring and Logging: Ensure you have proper logging in place for tracking which blobs are deleted and if any failures occur.
Cost: Consider the potential cost of listing and processing millions of blobs, especially if the lifecycle of these blobs is short and you need to delete them frequently.
This approach balances scalability and maintainability, especially when dealing with large volumes of blobs and ensuring they are deleted according to their original last modified date.
Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.