Troubleshooting 502 errors in ARR
This article helps you resolve the problems related to 502 errors in Application Request Routing (ARR).
Applies to: Internet Information Services
HTTP 502 - Overview
When you work with IIS Application Request Routing (ARR) deployments, one of the errors that you might see is "HTTP 502 - Bad Gateway". The 502.3 error means that while acting as a proxy, ARR was unable to complete the request to the upstream server and send a response back to the client. This problem can happen for multiple reasons. For example, failure to connect to the server, no response from the server, or the server took too long to respond (time out). If you're able to reproduce the error by browsing the web farm from the controller, and detailed errors are enabled on the server, you may see an error similar to the following error message:
The root cause of the error determines what you should do to resolve the issue.
502.3 timeout errors
ARR uses the error code in the preceding screenshot to proxy the request and determine the source of the failure because it contains the return code from WinHTTP.
You can decode the error code with the tool err.exe. In this example, the error code maps to ERROR_WINHTTP_TIMEOUT. You can also find this information in the IIS logs for the associated website on the ARR controller. The following is an excerpt from the IIS log entry for the 502.3 error, with most of the fields trimmed for readability:
sc-status | sc-substatus | sc-win32-status | time-taken |
---|---|---|---|
502 | 3 | 12002 | 29889 |
The win32 status 12002 maps to the same ERROR_WINHTTP_TIMEOUT error reported in the error page.
What exactly timed-out?
You can check this time-out by enabling Failed Request Tracing on the IIS server. The first point that you can see, in the failed request trace log and this is where the request was sent to in the ARR_SERVER_ROUTED event. The second point is the X-ARR-LOG-ID, which you can use to track the request on the target server. This helps to trace the target or destination of the HTTP request:
77. ARR_SERVER_ROUTED RoutingReason="LoadBalancing", Server="192.168.0.216", State="Active", TotalRequests="3", FailedRequests="2", CurrentRequests="1", BytesSent="648", BytesReceived="0", ResponseTime="15225" 16:50:21.033
78. GENERAL_SET_REQUEST_HEADER HeaderName="Max-Forwards", HeaderValue="10", Replace="true" 16:50:21.033
79. GENERAL_SET_REQUEST_HEADER HeaderName="X-Forwarded-For", HeaderValue="192.168.0.204:49247", Replace="true" 16:50:21.033
80. GENERAL_SET_REQUEST_HEADER HeaderName="X-ARR-SSL", HeaderValue="", Replace="true" 16:50:21.033
81. GENERAL_SET_REQUEST_HEADER HeaderName="X-ARR-ClientCert", HeaderValue="", Replace="true" 16:50:21.033
82. GENERAL_SET_REQUEST_HEADER HeaderName="X-ARR-LOG-ID", HeaderValue="dbf06c50-adb0-4141-8c04-20bc2f193a61", Replace="true" 16:50:21.033
83. GENERAL_SET_REQUEST_HEADER HeaderName="Connection", HeaderValue="", Replace="true" 16:50:21.033
The following example shows how this might look on the target server's Failed Request Tracing logs. You can validate that you have found the correct request by matching up the "X-ARR-LOG-ID" values in both traces.
185. GENERAL_REQUEST_HEADERS Headers="Connection: Keep-Alive Content-Length: 0 Accept: */* Accept-Encoding: gzip, deflate Accept-Language: en-US Host: test Max-Forwards: 10 User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0) X-Original-URL: /time/ X-Forwarded-For: 192.168.0.204:49247 X-ARR-LOG-ID: dbf06c50-adb0-4141-8c04-20bc2f193a61
<multiple entries skipped for brevity>
345. GENERAL_FLUSH_RESPONSE_END BytesSent="0", ErrorCode="An operation was attempted on a nonexistent network connection. (0x800704cd)" 16:51:06.240
In the previous example, you can see that the ARR server disconnected before the HTTP response was sent. The timestamp for GENERAL_FLUSH_RESPONSE_END can be used as a rough guide to find the corresponding entry in the IIS logs on the destination server.
date | time | s-ip | cs-method | cs-uri-stem | cs-uri-query | s-port | cs-username | sc-status | sc-substatus | sc-win32-status | time-taken |
---|---|---|---|---|---|---|---|---|---|---|---|
2011-07-18 | 16:51:06 | 192.168.0.216 | GET | /time/ | - | 80 | - | 200 | 0 | 64 | 45208 |
IIS on the destination server logged an HTTP 200 status code, indicating that the request completed successfully. Also, the win32 status has changed to 64, which maps to ERROR_NETNAME_DELETED. This status code generally indicates that the client (ARR being the 'client' in this case) disconnected before the request completed.
Cause
The ARR server reported a timeout, which is where you should look first.
Although the member server log indicates that the response was sent in 45 seconds (45208 ms), the IIS log entry from the ARR server indicates that the time-taken is very close to 30 seconds. This indicates that ARR is timing out the request, and you can confirm this by looking at the proxy timeout in the server farm's proxy settings. By default, it's set to 30 seconds.
So, in this case, you can clearly see that the ARR timeout was shorter than the execution of the request. Therefore, you might want to check to see if this execution time was typical or if you needed to look into why the request was taking longer than expected. If this execution time was expected and normal, increasing the ARR timeout should resolve the error.
Other possible reasons for ERROR_WINHTTP_TIMEOUT include:
ResolveTimeout
: Occurs if name resolution takes longer than the specified timeout period.ConnectTimeout
: Occurs if it takes longer than the specified timeout period to connect to the server after the name resolved.SendTimeout
: Occurs if sending a request takes longer than this time-out value. The send operation is canceled.ReceiveTimeout
: Occurs if a response takes longer than this time-out value. The request is canceled.
When you observe the first two reasons, ResolveTimeout
and ConnectTimeout
, the troubleshooting methodology outlined previously wouldn't work. This is because you wouldn't see any traffic on the target server and therefore wouldn't know the error code. So, in this case of ResolveTimeout
or ConnectTimeout
you might want to capture a WinHTTP trace for additional insight. See the WinHTTP or WEBIO tracing section and at the following blogs for other examples on troubleshooting and tracing:
- 502.3 Bad Gateway "The operation timed out" with IIS Application Request Routing (ARR)
- Winhttp Tracing Options for Troubleshooting with Application Request Routing
502.3 Connection termination errors
502.3 errors are also returned when the connection between ARR and the member server is disconnected mid-stream. To test this type of problem, create an .aspx page that calls Response.Close()
. In the following example, there's a directory called "time", which is configured with an .aspx page as the default document in that directory. When you try to browse to the directory, ARR shows the following error message:
The error 0x80072efe corresponds to ERROR_INTERNET_CONNECTION_ABORTED. The request can be traced to the server that actually processed it using the same steps used earlier in this troubleshooter, with one exception. While Failed Request Tracing on the destination server shows the request processed on the server, the associated log entry doesn't appear in the IIS logs. Instead, this request is logged in the HTTPERR log as follows:
HTTP/1.1 GET /time/ - 1 Connection_Dropped DefaultAppPool
The built-in logs on the destination server don't provide any additional information about the problem, so the next step would be to gather a network trace from the ARR server. In the previous example, the .aspx page called Response.Close()
without returning any data. Viewing this in a network trace would show that a Connection: close
HTTP header was coming from the destination server. With this information, you could start an investigation into why the Connection: close
header was sent.
The following error is another example of an invalid response from the member server:
In this example, ARR started to receive data from the client but something went wrong while reading the request entity body. This resulted in the 0x80072f78 error code being returned. To investigate further, use Network Monitor on the member server to get a network trace of the problem. This particular error example was created by calling Response.Close()
in the ASP.net page after sending part of the response and then calling Response.Flush()
. If the traffic between the ARR server and the member servers is over SSL, then WinHTTP tracing on Windows Server 2008 or WebIO tracing on Windows Server 2008 R2 may provide additional information. WebIO tracing is described later in this troubleshooter.
502.4 No appropriate server could be found to route the request
The HTTP 502.4 error with an associated error code of 0x00000000 generally indicates that all the members of the farm are either offline or otherwise unreachable.
The first step is to verify that the member servers are online. To check this, go to the Servers node under the farm in the IIS Manager.
To bring back the offline servers, right-click on the server name, and select Add to Load Balancing. If you can't bring the servers back online, verify if the member servers are reachable from the ARR server. Check the trace Messages pane in the Servers page to look for some clues about the problem. If you're using Web Farm Framework (WFF) 2.0, you may receive this error when the application pool restarts. You need to restart the Web Farm Service to recover.
WinHTTP or WebIO tracing
Usually, packet capture tools like WireShark provide you with the information you need to identify exactly what's timing out. However, there are times (such as when the traffic is SSL encrypted) that you will need to try a different approach. Starting with Windows 7 and Windows Server 2008 R2, you can enable WinHTTP tracing using the netsh tool by running the following command from an administrative command prompt:
netsh trace start scenario=internetclient capture=yes persistent=no level=verbose tracefile=c:\temp\net.etl
Then, reproduce the problem. After the problem is reproduced, stop the tracing by running the following command:
netsh trace stop
The stop
command takes a few seconds to finish. When it's done, you find a net.etl file and a net.cab file in C:\temp
. You will need to ensure that the C:\temp
folder exists before running the commands above. The .cab file contains event logs and other data that may prove helpful in analyzing the .etl file.
To analyze the log,
Open it in Netmon 3.4 or later.
Make sure you have set up your parser profile. To achieve this, open the Tools > Options menu, select the Parser Profiles tab > Windows profile, and then select the Set as Active button to apply the changes.
Scroll through the trace until you find the w3wp.exe instance where ARR is running by correlating with the UT process name column.
Right-click on w3wp and select Add UT Process name to display filter. This sets the display filter similar to:
UTProcessName == "w3wp.exe (1432)"
You can further filter the results by changing it to the following:
UTProcessName == "w3wp.exe (<pid>)" AND ProtocolName == "WINHTTP_MicrosoftWindowsWinHttp"
You'll need to scroll through the output until you find the timeout error. In the following example, a request timed out because it took more than 30 seconds (ARR's default timeout) to run.
336 2:32:22 PM 7/22/2011 32.6380453 w3wp.exe (1432) WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:32:23.123 ::sys-recver starts in _INIT state
337 2:32:22 PM 7/22/2011 32.6380489 w3wp.exe (1432) WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:32:23.123 ::current thread is not impersonating
340 2:32:22 PM 7/22/2011 32.6380584 w3wp.exe (1432) WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:32:23.123 ::sys-recver processing WebReceiveHttpResponse completion (error-cdoe = ? (0x5b4), overlapped = 003728F0)
341 2:32:22 PM 7/22/2011 32.6380606 w3wp.exe (1432) WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:32:23.123 ::sys-recver failed to receive headers; error = ? (1460)
342 2:32:22 PM 7/22/2011 32.6380800 w3wp.exe (1432) WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:32:23.123 ::ERROR_WINHTTP_FROM_WIN32 mapped (?) 1460 to (ERROR_WINHTTP_TIMEOUT) 12002
343 2:32:22 PM 7/22/2011 32.6380829 w3wp.exe (1432) WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:32:23.123 ::sys-recver returning ERROR_WINHTTP_TIMEOUT (12002) from RecvResponse()
344 2:32:22 PM 7/22/2011 32.6380862 w3wp.exe (1432) WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:32:23.123 ::sys-req completes recv-headers inline (sync); error = ERROR_WINHTTP_TIMEOUT (12002)
In the following example, the content server was completely offline:
42 2:26:39 PM 7/22/2011 18.9279133 WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:26:39.704 ::WinHttpReceiveResponse(0x11d23d0, 0x0) {WINHTTP_MicrosoftWindowsWinHttp:4, NetEvent:3}
43 2:26:39 PM 7/22/2011 18.9279633 WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:26:39.704 ::sys-recver starts in _INIT state {WINHTTP_MicrosoftWindowsWinHttp:4, NetEvent:3}
44 2:26:39 PM 7/22/2011 18.9280469 WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:26:39.704 ::current thread is not impersonating {WINHTTP_MicrosoftWindowsWinHttp:4, NetEvent:3}
45 2:26:39 PM 7/22/2011 18.9280776 WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:26:39.704 ::sys-recver processing WebReceiveHttpResponse completion (error-cdoe = WSAETIMEDOUT (0x274c), overlapped = 003728F0) {WINHTTP_MicrosoftWindowsWinHttp:4, NetEvent:3}
46 2:26:39 PM 7/22/2011 18.9280802 WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:26:39.704 ::sys-recver failed to receive headers; error = WSAETIMEDOUT (10060) {WINHTTP_MicrosoftWindowsWinHttp:4, NetEvent:3}
47 2:26:39 PM 7/22/2011 18.9280926 WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:26:39.704 ::ERROR_WINHTTP_FROM_WIN32 mapped (WSAETIMEDOUT) 10060 to (ERROR_WINHTTP_TIMEOUT) 12002 {WINHTTP_MicrosoftWindowsWinHttp:4, NetEvent:3}
48 2:26:39 PM 7/22/2011 18.9280955 WINHTTP_MicrosoftWindowsWinHttp WINHTTP_MicrosoftWindowsWinHttp:12:26:39.704 ::sys-recver returning ERROR_WINHTTP_TIMEOUT (12002) from RecvResponse() {WINHTTP_MicrosoftWindowsWinHttp:4, NetEvent:3}
Other resources
Third-party information disclaimer
The third-party products that this article discusses are manufactured by companies that are independent of Microsoft. Microsoft makes no warranty, implied or otherwise, about the performance or reliability of these products.