How to reliably copy Azure Blob inventory CSV files from one Azure Blob storage account to another?

GW999 1 Reputation point
2023-01-31T11:34:35.34+00:00

Currently when you configure a Blob inventory rule you can only specify an output container that resides on the same storage account that is subject to the inventory operation. That is a real pain because we want to be able to import the .CSV files from the inventory report into Power BI so that we can report on volume consumption broken down by file extension.

Power BI requires that public access is enabled on the storage account so that it can reach the source files. It is not appropriate for us to enable public access to our source storage account as it stores sensitive data. This requires that the inventory report .CSV files are copied to another storage account so that public access can be safely enabled for Power BI on that account.

So at present, we have to:

  • Run a Blob inventory job on storage account A.
  • Run an Azure function that executes AzCopy.exe to copy the Blob inventory .CSV files from storage account A to storage account B.
  • Permit Power BI to import the .CSV files from storage account B.

Our Azure function run.ps1 looks like this:

param([byte[]] $InputBlob, $TriggerMetadata)

Write-Host "PowerShell Blob trigger function Processed blob! Name: $($TriggerMetadata.Name) Size: $($InputBlob.Length) bytes"
$src = [source path + SAS]

$dst = [destination path + SAS]

C:\home\site\wwwroot\BlobInvTrigger\azcopy.exe copy $src $dst --recursive --overwrite true --include-pattern *.csv

Our function is failing with "Exception of type 'System.OutOfMemoryException' was thrown." The function is 64 bit and uses the consumption plan. The files being copied total 650 MB in size (so below the 1.5 GB RAM and 5 TB disk space function allowances for the plan) so the failure is inexplicable.

One suggestion I spotted on the web suggested using AzCopy with sync so that the files are not copied locally by the function. However, this will duplicate the folder hierarchy not just the files, and Blob inventory output is structured something like this:

<root><year><month><day><hour><minute>[.CSV files]

This wouldn't work, as Power BI would be looking in a single, fixed container for the source .CSV files and the container path would change each day.

I'm about to give up - does anyone have any creative ideas on how to satisfy my requirement?

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,436 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Luke Murray 10,526 Reputation points MVP
    2023-01-31T23:26:44.8866667+00:00

    Take a look at Object Replication - for Blob storage, and set up a filter for CSV files only.

    Diagram showing how object replication works

    0 comments No comments