Troubleshooting Scenario 4 - Windows Azure Traffic Manager Degraded Status

 

This post will describe how to troubleshoot a Windows Azure Traffic Manager profile which is showing a Degraded status, and provide some key points to understand about traffic manager probes. This is a continuation of the troubleshooting series.

 

Symptom

You have configured a Windows Azure Traffic Manager profile pointing to some of your .cloudapp.net hosted services and after a few seconds you see the Status as Degraded.

 

image

 

If you go into the Endpoints tab of that profile you will see one or more of the endpoints in an Offline status:

image

 

 

Important notes about WATM probing

  • WATM only considers an endpoint as ONLINE if the probe gets a 200 back from the probe path.
  • A 30x redirect (or any other non-200 response) will fail, even if the redirected URL returns a 200.
  • For HTTPs probes, certificate errors are ignored.
  • The actual content of the probe path doesn’t matter, as long as a 200 is returned. A common technique if the actual website content doesn’t return a 200 (ie. if the ASP pages redirect to an ACS login page or some other CNAME URL) is to set the path to something like “/favicon.ico”.
  • Best practice is to set the Probe path to something which has enough logic to determine if the site is up or down. In the above example setting the path to “/favicon.ico” you are only testing if w3wp.exe is responding, but not if your website is healthy. A better option would be to set a path to something such as “/Probe.aspx”, and within Probe.aspx include enough logic to determine if your site is healthy (ie. check perf counters to make sure you aren’t at 100% CPU or receiving a large number of failed requests, attempt to access resources such as the database or session state to make sure the application’s logic is working, etc).
  • If all endpoints in a profile are degraded then WATM will treat all endpoints as healthy and route traffic to all endpoints. This is to ensure that any potential problem with the probing mechanism which results in incorrectly failed probes will not result in a complete outage of your service.

 

Troubleshooting

The best tool for troubleshooting WATM probe failures is wget. You can get the binaries and dependencies package from https://gnuwin32.sourceforge.net/packages/wget.htm. Note that you can use other programs such as Fiddler or curl instead of wget – basically you just need something that will show you the raw HTTP response.

Once you have wget installed, go to a command prompt and run wget against the URL + Probe port & path that is configured in WATM. For this example it would be https://watestsdp2008r2.cloudapp.net:80/Probe.

image

 

image

 

Notice that wget indicates that the URL returned a 301 redirect to https://watestsdp2008r2.cloudapp.net/Default.aspx. As we know from the “Important notes about WATM probing” section above, a 30x redirect is considered a failure by WATM probing and this will cause the probe to report Offline. At this point it is a simple matter to check the website configuration and make sure that a 200 is returned from the /Probe path (or reconfigure the WATM probe to point to a path which will return a 200).

 

If your probe is using HTTPs protocol you will want to add the “--no-check-certificate” parameter to wget so that it will ignore the certificate mismatch on the cloudapp.net URL.

Comments

  • Anonymous
    August 05, 2014
    What are all the possible source IP address for the traffic manager probes?

  • Anonymous
    August 05, 2014
    As of today the possible source IP address is the entire IP address range for Azure Compute (www.microsoft.com/.../details.aspx).  There is some work to narrow down this range, but no ETA at this time.

  • Anonymous
    September 23, 2014
    How can one reliably detect if an incoming request is from Traffic Manager? I see it sends requests with the "GTMProbe" string in the user agent, but without any assurance that this user agent string could change in the future, I wouldn't want to count on that as a detection strategy.

  • Anonymous
    September 23, 2014
    The comment has been removed

  • Anonymous
    June 30, 2015
    How can I use TrafficManager with a WebApp, that runs under a custom domain and uses a rewrite URL (so all requests going to that site via different bindings end up under the custom domain)? I understand that every first request no being a 200 will lead to degraded status. Also, as all the endpoints are in a degraded status because of above redirect, requests will be routed.