Azure Function App (Serverless) Returning Frequent 503

Alexander Kammerer 121 Reputation points
2021-10-06T15:28:19.453+00:00

In the last few weeks we have observed a frequent problem with the Function App (serverless, python 3.8) that we are running.

What will happen is that we try to call the function app from a logic app via an API gateway. Then, we get back 503 Service Unavailable. The function app never records any logs or any interaction (the API gateway does). When we retry after a few minutes, it works fine.

However, it is not that all requests lead to 503. What I thought at first was that something was wrong after a cold start. But this is not the case. Sometimes everything is fine when we experience a cold start and sometimes we get back a 503 for a few minutes until it finally works.

What I also observed is that when I go into the Azure Portal and into the functions blade, I get told that the Function Host is unavailable. At this point, I cannot see any logs via the portal (not any current ones). Afterwards, I was not able to see anything irregular in the function logs. It seems that these requests are never handled by the function app (they do not get through the built-in proxy in front of the actual python app, it seems).

Could you please help me debug this? I am able to provide any type of logs or the name of the function app. We do not pay for support, so I cannot simply open a support ticket.

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,921 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. MughundhanRaveendran-MSFT 12,506 Reputation points
    2021-10-07T07:29:48.193+00:00

    @Alexander Kammerer ,

    Thanks for reaching out.

    503 service unavailable will be caused for the following reasons

    • Function host is down/restarting
    • Platform issue due to the backend server not running/ allocated
    • Memory leak/issue from the code causing the backend server to return 503

    I would suggest you to look into the "Diagnose and solve problems" blade in the Function app and select the "Function app down or reporting" detector. This detector will show all the diagnostic information about the function app and its infrastructure. This will give some insights about the function host related issues. Also check the Web app restarted section to see if there were any platform related issues that could contribute to 503 error. Please note that if you are running on Linux platform, then you would get information about the container recycles in the Web app restarted detector.

    138452-image.png

    Whenever you are getting the "Azure Functions runtime unreachable" error in the portal, please take a look at the below article to troubleshoot the issue

    https://learn.microsoft.com/en-us/azure/azure-functions/functions-recover-storage-account

    I hope this helps!

    Please 'Accept as answer' and ‘Upvote’ if it helped so that it can help others in the community looking for help on similar topics.

    1 person found this answer helpful.

  2. PECORARI Emanuele 6 Reputation points
    2021-10-07T09:50:58.037+00:00

    Hi,
    we are experiencing the exact same issue with the same architecture in West Europe than @Alexander Kammerer .
    The issue is still there this morning despite the Microsoft support wrote this message to us:

    Microsoft Azure Team has investigated the issue you reported on Azure Functions that resulted in errors on several of your Function Apps not starting. This issue was found to be related to an issue within the Linux Consumption backend worker allocation.
    Upon investigation, engineers discovered during this time period we had an unexpected backend issue that affected the worker process allocation logic in this region and were able to mitigate the issue with a fix.
    We are continuously taking steps to improve the Azure Web App service and our processes to ensure such incidents do not occur in the future, and in this case it includes (but is not limited to):
    Improved monitoring to detect such failures faster and take remedial action automatically.
    Review and update throttle limits based on usage patterns for all backend services.
    We apologize for any inconvenience.

    Could you please help?

    Thanks
    Best
    Emanuele Pecorari

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.