Functions fail with "Operations that change non-concurrent collections..." error and returns 404 status code

Christian Schreuder 11 Reputation points
2021-06-25T09:01:12.87+00:00

Hi

We have a Function App which hosts several functions called via several Azure Logic Apps. These functions do simple JSON operations like constructing a JSON object or array or looks up data in a storage table. If the function has been idle for a few hours we start seeing the below error in Log Analytics:

"The '[FUNCTION NAME]' function is in error: Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct."

The error returned to the Logic Apps simply state a status code of 404 with no content, even with a try/catch wrapped around the entire function execution. When this happens, all subsequent calls to the function in error fails with this error code until the Function App is either restarted or, strangely enough, a test run is performed on the function manually (which passes). Resubmitting the Logic Apps then manually after one of the previous actions have been performed the function works again.

The functions then works for several hours until there are a few hours of inactivity, then the entire process repeats.

Research shows that this is a common issue in .Net Core 3.1 but mostly related to Identity Server and Entity Framework and the answers there points to either a bug in .Net Core itself or in EF and has to do with thread safe dictionaries mostly.

  • Is there any way of creating thread safe JObjects?
  • Will a synchronous pattern of calling the Function Apps make a difference?
  • Why is the try/catch not returning this error? (As if the error occurs before the function is even called).

I can provide more information, logs, source code etc. if needed and have now added logging information at the start of the functions even though I suspect this will not show me anything.

The current host version of the Function App is 3.0.15733.0 and the only references are Newtonsoft.Json in all functions and Microsoft.Azure.Cosmos.Table in some of the functions.

Thank you.

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
3,951 questions
0 comments No comments
{count} vote

1 answer

Sort by: Most helpful
  1. Christian Schreuder 11 Reputation points
    2021-06-28T11:39:40.763+00:00

    Upon further investigation it appears this issue happens during the warmup of the Function App (as it is a consumption plan app). The logs show the following entries:

    • 6/22/2021, 2:42:17.522 AM - Host.Triggers.Warmup - Initializing Warmup Extension.
    • Init host logs....
    • 6/22/2021, 2:42:17.557 AM - Host.Startup - Loading functions metadata
    • 6/22/2021, 2:42:17.561 AM - Host.Startup - 6 functions loaded
    • 6/22/2021, 2:42:20.335 AM - Host.Startup - Generating 3 job function(s)
    • 6/22/2021, 2:42:20.343 AM - Host.Startup - Found the following functions: (only shows the 3 functions which does not have an issue)
    • 6/22/2021, 2:42:20.344 AM - Microsoft.Azure.WebJobs.Script.WebHost.WebScriptHostHttpRoutesManager - Initializing function HTTP routes

    The normal logs then continue after a start up: Host initialized, Host started, Job host started; then for each of the 3 functions which did not load the error appears.

    The functions are then executed (at least the functions which work):

    • 6/22/2021, 2:42:20.428 AM - Function.Parse-CE-Attributes - Executing 'Functions.Parse-CE-Attributes' (Reason='This functi
    • 6/22/2021, 2:42:20.642 AM - Executed 'Functions.Parse-CE-Attributes' (Succeeded, Id=08
    • 6/22/2021, 2:42:20.643 AM - Host.Metrics - {"requestId":"31a96095-83b9-442b-8209-f77a265681c0

    Then the warmup process repeats within milliseconds after the functions are triggered:

    • 6/22/2021, 2:42:20.868 AM - Host.Triggers.Warmup - Initializing Warmup Extension.
    • ...
    • 6/22/2021, 2:42:20.901 AM - Host.Startup - Loading functions metadata
    • 6/22/2021, 2:42:20.904 AM - Host.Startup - 6 functions loaded
    • 6/22/2021, 2:42:22.679 AM - Host.Startup - Generating 6 job function(s)

    All functions are then called and run successfully. After some time the function stops as per normal:

    • 6/22/2021, 2:48:49.666 AM - Microsoft.Azure.WebJobs.Hosting.JobHostService - Stopping JobHost

    The next run is then triggered, and we go through the warmup process again:

    • 6/22/2021, 4:58:48.500 AM - Host.Triggers.Warmup - Initializing Warmup Extension.
    • ....
    • 6/22/2021, 4:58:48.545 AM - Host.Startup - 6 functions loaded
    • 6/22/2021, 4:58:50.086 AM - Host.Startup - Generating 3 job function(s)

    Failed again to load those 3 functions, the other 3 loaded fine as per normal. The warmup process then repeats again a few seconds later this time, loading all 6 functions after the other functions were triggered:

    • 6/22/2021, 4:58:57.055 AM - Host.Triggers.Warmup - Initializing Warmup Extension.
    • ....
    • 6/22/2021, 4:58:58.551 AM - Host.Startup - Generating 6 job function(s)

    Now by default the Logic App would automatically retry if an error is returned; however, when the 3 functions fail to load the Logic App gets a 404 error, which obviously doesn't trigger an automatic retry and the Logic App fails.

    It should also be noted that during the 5 minute run after any of the startups, all 6 functions work as intended because of the second warmup cycle.

    This points to an issue within the Function App itself. I have tried to reproduce the error with a new Function App and Logic App which runs every 2 hours and not a single failure over the past 3 days. All functions always loaded correctly the first time.

    I will redeploy these functions in a new Function App and see if the issue persists, but I doubt it will as this specific Function App appears to be broken.

    I guess it is just one of those things that can go wrong in the cloud and no one can explain it.

    1 person found this answer helpful.
    0 comments No comments