New Azure Function App processes blobs that were already processed by another Function App

Mykhailo Seniutovych 0 Reputation points
2024-03-26T13:07:01.6433333+00:00

I have the following situation. I had a function that ran on Windows App Service Plan. That function processed blobs. The function had default LogsAndContainerScan trigger. Now after some time I decided to rewrite this function and also migrate it from Windows to Linux, I also wanted to deploy it in an isolated environment inside a docker container. To accomplish this I createad another Function App that was running on a new App Service Plan for Linux. During the deployment I deployed and started a new function app on Linux, and stopped the old one for Windows.

To my big surprise, the new function started to process the blobs that were processed long ago by the previous function. After some digging and reading answers on Stack Overflow for example this one or this one , it seems to me that the function will process a blob only if it does not have a blob receipt inside azure-webjobs-hosts blob container. When I looked at my azure-webjobs-hosts blob container I found out that there are actually two folders in there - one for my previous function, and one for my new function. So I conclude that even though there were receipts for the existing blobs, they were in the folder of the old function app, which means that when I created a new function app, it tried to find the receipts in another folder, couldn't find them, so it started to process all of the blobs again. Which basically means that whenever I decide to create another function app with a blob trigger, it will try to reprocess all of the existing files.

The question that I have.

  1. Is my reasoning above correct, and every function app reprocess the blobs again that were processed before? If no why did it happen in my situation?
  2. Is there any way I can avoid this situation in the future, when I, for example, decide to create yet another function app that will operate on the same blob container?
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,212 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,415 questions
Azure App Service
Azure App Service
Azure App Service is a service used to create and deploy scalable, mission-critical web apps.
6,830 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Thomas Meads 1,506 Reputation points
    2024-03-27T18:46:16.96+00:00

    Hi,

    Q1

    The blob storage trigger for functions uses the function app name as part of the receipt so deploying a new app would cause this to happen. Note *it is for the **function app ***and contains the function app name in the receipt.

    Azure Functions stores blob receipts in a container named azure-webjobs-hosts in the Azure storage account for your function app (defined by the app setting AzureWebJobsStorage). A blob receipt has the following information:

    • The triggered function (<FUNCTION_APP_NAME>.Functions.<FUNCTION_NAME>, for example: MyFunctionApp.Functions.CopyBlob)

    I have also personally seen blobs be reprocessed seemingly at random for which I have never identified a cause. Could have been this issue or another.

    Q2

    I see 2 real options to solving this firstly:

    Firstly, it seems that, based on this post, the hostID is what is actually used for the container names / receipts and is set on deployment. You can set this via AzureFunctionsWebHost__hostid app-setting. This could work as a solution assuming that you do not run multiple functions with the same host as this will cause a collision and the apps will crash assuming framework v4 and up. More info here.

    (Disclaimer - I have not tested changing the hostID so would need validating)

    Alternatively, you could move the blobs to a processed folder after processing. This does ultimately mean you are not leaning on the framework and will require more work but you can be assured (bar failures) that the blobs will not be processed again. I have done this for a solution in the past due to the critical impact reprocessing the files would have on the business.

    I hope this answers you question.

    If it does please mark this as accepted.

    0 comments No comments