Intermittent 503s on Azure Front Door from "OriginInvalidResponse"

Don Wise 25 Reputation points
2023-07-14T03:31:36.5533333+00:00

I'm using Azure Front Door as the load balancer for several services behind it. One in particular is a Wordpress sitting on a custom Linux VM. I also use Azure Domain services, and Azure Front Door manages the SSL certs. I see 1-2% of my traffic failing to 503s, and I hit it a lot myself when using my website heavily. I cannot for the life of me fix it. Uptime on all the back ends is 100%, leading me to see this as an Azure Front Door issue. The 503s are sparse, but it can be very jarring to the end user when it happens. Worse, this error page isn't even customizable.

I've scoured the internet for solutions and tried everything I could find, including:
https://learn.microsoft.com/en-us/azure/frontdoor/troubleshoot-issues#503-or-504-response-from-azure-front-door-after-a-few-seconds
https://learn.microsoft.com/en-us/answers/questions/955337/azure-front-door-returns-intermittent-503-response

The last article mentions that the Azure Front Door team helped fix something on their end, so I am unsure if I may need a similar fix.

I've used the following query to look at my logs and found that most of these issues are "OriginInvalidResponse." In the above suggestions, I've walked through the above and tried everything suggested already. Asking for help is my last resort.

AzureDiagnostics
| extend Is5XX = (toint(httpStatusCode_s ) >= 500 and toint(httpStatusCode_s ) < 600)
| where Is5XX == true
| project TimeGenerated, requestUri_s, httpMethod_s, httpStatusCode_s, rulesEngineMatchNames_s, cacheStatus_s, errorInfo_s, originName_s, originUrl_s, domain_s
| order by TimeGenerated desc 
| limit 100

Azure Front Door
Azure Front Door
An Azure service that provides a cloud content delivery network with threat protection.
858 questions
{count} votes

Accepted answer
  1. GitaraniSharma-MSFT 50,096 Reputation points Microsoft Employee Moderator
    2023-07-21T21:06:54.9466667+00:00

    Hello @Don Wise ,

    I understand that you are receiving intermittent 503 error on Azure Front Door with code "OriginInvalidResponse".

    Intermittent 503 errors with "ErrorInfo: OriginInvalidResponse" are mostly caused because the backend has a KeepAlive timeout less than 90 seconds. When the origin has a lower idle timeout that AFD's, the 503 errors are random and low volume.

    It can happen if your backend closes a kept alive HTTP connection, right at the moment when AFD reuses the same connection for a new request.

    Let's say your backend has an HTTP keepalive timeout that is less than 90 seconds. Azure Frontdoor reuses connections to improve performance, so when a connection is created for handling one request, that TCP connection is kept open for reuse (HTTP keepalive). AFD has an idle keepalive timeout of 90 seconds. But if your origin times out and disconnect sooner than this 90 second, then there can be a race condition that may result in this error. Specifically, AFD may reuse a connection, sending a new HTTP request, right at the moment when the origin times out and sends a TCP FIN. AFD interprets that receiving TCP FIN after sending a new request as an invalid response, and hence the error.

    Unfortunately, this 90 second idle timeout is not configurable at AFD side.

    Refer: https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-limits#front-door-to-application-back-end

    To fix this issue, your backend should hold already used connections open for at least 91 sec, so AFD can reuse them for subsequent requests (if any). You need to make sure that the keepalive timeout on backend is more than 90 sec.

    I asked you to validate your backend configuration and enable/change keepalives on your backend to more than 90 seconds.

    You have a custom Linux VM in your backend, and it is configurable. So, you set the "KeepAliveTimeout" to 92 seconds in the httpd.conf configuration file.

    Refer: https://httpd.apache.org/docs/2.4/mod/core.html#keepalive

    The httpd.conf file of your Apache server is now configured as below:

    enter image description here

    You waited for 2 days to observe the behavior and have confirmed that the issue is now fixed for you.

    Kindly let us know if the above helps or you need further assistance on this issue.


    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.