May 2011

Volume 26 Number 05

Forecast: Cloudy - Load Balancing Private Endpoints on Worker Roles

By Joseph Fultz | May 2011

Joseph FultzEarly in January, David Browne and I worked on a solution to load balance internal service points on Azure Worker Roles. Generally, service endpoints in Worker Roles are published, so the load balancer can take care of balancing the calls across the instances. However, the customer with whom we were working needed endpoints that were not publicly addressable. In addition, they didn’t want to take on the latency of some type of queuing operation. How would we address this requirement?

During an internal event meant for us to explore various technologies and solutions, David and I came up with two different approaches for solving the challenge. For this month’s column, I’ll cover my design considerations and the bits of code used to prototype one of these approaches.

Not wanting to inadvertently bottleneck the final solution, we ruled out a software proxy-style solution. Instead, I chose a software mechanism that will provide a valid IP for service calls and the calling node will cache the endpoint for a given duration to reduce the overhead of endpoint resolution. The three primary strategies I considered were:

• Static assignment: assign a service endpoint to each calling node

• Centralized control: one node tracks and controls assignment of each calling node

• Cooperative control: allow any given node to indicate if it’s available to service calls

Each of these choices brings with it a set of benefits and a set of disadvantages.

Static assignment has the upside of being simple to implement. If the mapping of units of work between the caller and the worker are equal, then this might be a feasible approach for balancing because the load-balancing solution for the Web Role will by extension balance the calls on the Worker Role.

The two primary disadvantages are that it doesn’t address high availability for the service, nor does it address any discrepancy in load between the caller and the service node. If I attempt to morph the static assignment solution to address the problems, the solution almost assuredly starts to move toward either centralized or cooperative control.

Centralized Control

A typical load balancer that receives health information and balances service requests based on such information utilizes centralized control. It collects information about the nodes, what it knows about the assignments it has made and any heartbeat information, and it directs requests made to the Virtual IP (VIP) to a given node.

In this scenario the central point would do mostly the same except it would not act as a proxy for the request, but rather the calling node will ask the central controller to receive a good address to make the call, and the controller will assign an address based on what it knows (see Figure 1). The calling node will cache the endpoint and use it for a predetermined quantum, which upon expiring will repeat the resolution process.

Centralized Control

Figure 1 Centralized Control

The intelligence all lies within the central controller and it must track all of the requisite information needed to determine to which node to assign calls. This could be as simple as round robin or it could take on full data collection and health analysis. It could also be complicated by the various service endpoints having different criteria to determine availability, meaning that the central controller has to be all-knowing across service implementations in the worker pool.

The biggest detractor from this implementation is that, if the central controller is down, then the system is down. This means a completely separate solution must be implemented to solve for high availability of the central controller.

In some robotic and matrix systems, worker nodes will elect a primary controller and, if the heartbeat is lost, they simply elect a new one. While this is a good design because it combines both the centralized control and cooperative control, it also adds significantly to the implementation of the load-distribution mechanism.

Cooperative Control

Anyone who’s gone through management to get someone to do something knows it can be a real hindrance to actually getting someone to do the work. Asking him directly if he has time to do it turns out to be much more expedient and, given he’s a good judge of effort, is the best way to determine if he actually has time to do the work. Such is the model I followed.

The idea is that each of the calling nodes will start with its currently assigned service endpoint and ask whether it’s still available (see Figure 2). If it isn’t, the node will continue to round robin through the available pool until one responds positively (see Figure 3). After that, the same expiry cache mechanism described earlier is used to reduce the endpoint resolution overhead.

Cooperative Control

Figure 2 Cooperative Control

Balancing to Another Node

Figure 3 Balancing to Another Node

The upside of this design is that HA is taken care of by design and there should be high fidelity between the node determining its availability and the worker actually being able to service callers. Each of the service nodes should have the intelligence baked into its implementation that it’s aware of things specific to the service that would make it available or not. This is intelligence beyond CPU and such and could be things such as availability of downstream systems that are accessed by the node. Thus, if the node returns a negative, an error or a timeout, the calling node queries the next available service node and, if available, makes its service calls to that endpoint.

The big detractor from this solution is that it requires implementation on both sides of the fence to provide an availability service and a calling protocol between the caller and the endpoints to ascertain the endpoint availability.

The Prototype

The sample will do the following things:

  • Setup a standard mechanism to determine availability
  • The caller will cache an available node for a brief period
  • I’ll be able to disable a node for a set quantum, which should show up as all of the calls being balanced to a single node
  • Once the node becomes available again, the caller should be able to return to the previous node

Some caveats: First, I’m not doing any work to intelligently determine availability, given I’m just setting up the balancing mechanism, and not worried about the intelligence behind the decision. Additionally, I’m not handling errors and timeouts, but those would be handled in the same manner as getting a negative result from the availability query. Finally, I’m simply grabbing all Worker Roles in the deployment, but in a true implementation a more intelligent way to determine all available service endpoints might be desired, such as a registry mechanism or simply attempting to hit the service on each endpoint and marking successful calls as possible endpoints. The code does go as far as to ask for a specific private endpoint and if that’s different per service, then that could be used as a differentiator.

The first thing to do is to do is get the list of IPs from the Worker Roles in the deployment. To accomplish that goal I have to configure the roles. For the Worker Roles I open the configuration window and add an internal service endpoints as shown in Figure 4.

Adding an Internal Service Endpoint to the Worker Role

Figure 4 Adding an Internal Service Endpoint to the Worker Role

I’ve also labeled the Worker Roles in the deployment as PrivateServices. Using the API of the RoleEnvironment object and the label, it’s easy to fetch the nodes:

if (_CurrentUriString == null) {


    ServiceInstances = null;


    WebInstances = null;

  ServiceInstances = 


  WebInstances = 


I’ll match the starting node for checking availability by using the ordinal number of the node. If there are more Web Roles than Worker Roles, I’ll use a mod function to match a starting node. With the instances in hand and a starting node to test availability, I can start to loop through and test the endpoints (see Figure 5).

Figure 5 Testing Endpoints

while (!found && !Abort) {

  string testuri = 



  found = CheckAvailability(testuri);

  if (found) { 

    ServiceUriString = testuri; 


  else {


    if (idxSvcInstance >= ServiceInstances.Count) { 

      idxSvcInstance = 0; 



    if (loopCounter == ServiceInstances.Count) { 

      Abort = true; 




Note that there is a call to a function named CheckAvailability (see Figure 6). Within that function I create a binding using None for the security mode because the endpoint is internal only. I instantiate the service client and set a reasonable timeout and return the value of the call.

Figure 6 CheckAvailability

static public bool CheckAvailability(string uri) {

  bool retval = true;

  Binding binding = new NetTcpBinding(SecurityMode.None);

  EndPointServicesRef.EndPointServicesClient endpointsvc = 

    new EndPointServicesRef.EndPointServicesClient(binding, 

    new EndpointAddress(@"net.tcp://" + uri));

  endpointsvc.InnerChannel.OperationTimeout = 

    new System.TimeSpan(0,0,0,0, 5000);

  try {

    retval = endpointsvc.IsAvailable();


  catch (Exception ex) {

    // Todo: handle exception

    retval = false;


  return retval;


If an error occurs during the call, I simply return false and allow the loop to move on to the next node and check its availability. Note, however, that to determine the Web Role instance number that the code is currently executing under I’ve parsed the instance ID. To make this work at all I had to open an arbitrary internal (could’ve been external) endpoint. If I hadn’t, it wouldn’t increment the ID and the parse would be useless because every node would look like the only one.

Another way to create a list of nodes would be to iterate through all of the nodes, identifying the ordinal position of the current executing node in the list or to just order them by the last octet of the IP. Either of the latter two methods would be a little more fool-proof, but for this particular example I just used the instance ID.

One more caveat is that the structure of the ID differs between the actual deployment and the development fabric, thus forcing me to handle it in the parse code, like so:

string[] IdArray = 


int idxWebInstance = 0;

if (!int.TryParse((IdArray[IdArray.Length - 1]), 

  out idxWebInstance)) {

  IdArray = RoleEnvironment.CurrentRoleInstance.Id.Split('_');

  idxWebInstance = int.Parse((IdArray[IdArray.Length - 1]));


This should return a good endpoint IP that I can cache in a static variable. I then set a timer. When the time event fires I’ll set the endpoint to null, causing the code to once again look for a valid endpoint to use for services:

System.Timers.Timer invalidateTimer = 

  new System.Timers.Timer(5000);

invalidateTimer.Elapsed += (sender, e) => 

  _CurrentUriString = null;


Here I used a short duration of 5 seconds because I want to ensure that in a short test execution I can bounce at least one Web Role to another endpoint once I disable one of the service nodes.

Running the Demo

Now, I’m going to modify the default page and its codebehind to simply show the node to which it established an affinity. I’ll also add a button to disable a node. Both pieces of code are pretty simple. For example, the disable button will disable the service endpoint associated with the Web page to which the request gets balanced. So, it can lead to a little bit of a quirky UI feel for this test sample.

I’ll add a label and a command button to the UI. In the label I’ll print out the ID of the assigned endpoint and the button will allow me to disable a node so I can see all Web Roles associated with a single endpoint until the node comes back online. Inside of the code behind I add a little code on the page load to get the endpoint (see Figure 7).

Figure 7 Demo Page Code

protected void Page_Load(object sender, EventArgs e) {

  string UriString = EndpointManager.GetEndPoint();



  Binding binding = new NetTcpBinding(SecurityMode.None);


  EndPointServicesRef.EndPointServicesClient endpointsvc = 

    new EndPointServicesRef.EndPointServicesClient(binding, 

    new EndpointAddress(@"net.tcp://" + UriString));

  lblMessage.Text = "WebInstacne ID: " + 

    RoleEnvironment.CurrentRoleInstance.Id.ToString() + 

    " is Calling Service @ " + UriString + " & IsAvailable = " + 




Because I’m really only attempting to illustrate the cooperative balancing, I haven’t implemented another service method or interface, so I simply reuse the IsAvailable method to illustrate the point.

Figure 8 shows the prototype application in action. First you can see the ID (this one is from the development fabric), the IP and whether it’s available. Refreshing the page causes the request to balance, thus the endpoint also shows up differently. If I click the disable button, a small piece of code runs to set call DisableNode for the current endpoint:

protected void cmdDisable_Click(object sender, EventArgs e) {

  Binding binding = new NetTcpBinding(SecurityMode.None);

  EndPointServicesRef.EndPointServicesClient endpointsvc = 

    new EndPointServicesRef.EndPointServicesClient(binding, 

    new EndpointAddress(@"net.tcp://" + LastUri));



Running the Demo

Figure 8 Running the Demo

The DisableNode method simply sets the Boolean and then sets up a timer to enable it back. The timer is set to be a bit longer than the expiry for the cached endpoint so as to make it easier to illustrate this in the test run:

public void DisableNode() {

  AvailabilityState.Enabled = false;

  AvailabilityState.Available = false;

  System.Timers.Timer invalidateTimer = 

    new System.Timers.Timer(20000);

  invalidateTimer.Elapsed += (sender, e) => EnableNode();



With the node disabled, the subsequent requests coming from different Web servers should all balance to the same worker endpoint.

Beyond the Example

This is obviously a trivial example to illustrate the point, but I want to highlight some things to consider for an actual implementation. I also want to mention David’s implementation to solve the problem because he addressed a domain of problems that I did not.

It was my intent for this example that the calling node would run the endpoint resolution code as part of the role startup process. It would cache the endpoint in a static member or in an actual cache refreshing based on cache expiry. However, it could be combined as part of the service implementation, allowing fine-grain control versus the unit being at the level of the IP and port combo. 
Depending on the actual problem being solved and the design of the service fabric, I might choose one style over the other.

To get this running in a production environment, here are some things to consider and possibly resolve:

  • The intelligence for deciding the availability. This means not only the things that might be examined (CPU, disk, back-end connection state and so on), but also the thresholds that should be used to flip the bit between being available or not.
  • Logic to handle the case that all return unavailable.
  • Decisions about the quantum to cache the endpoint.
  • Some additional methods in the EndpointManager to change settings, remove nodes from the pool and general runtime maintenance.
  • All of the typical exception handling and diagnostics usually included in a service implementation.

I realize that those things probably go without saying, but I like to stick with a guideline of “No Guessing.”

In quick summary of David’s approach, he set up a matrix between Fault Domains and Upgrade Domains in an attempt to ensure that the caller availability matched the endpoint availability by preferring endpoints in the same domains. I think this is a great idea. Combining my implementation with his would ensure that your Web is serviced by a Worker Role following the same service level agreement if at all possible, but in the case that none are available, it would have the ability to balance to any other node.

Final Thoughts

I hope the Azure will evolve to allow load balancing for private endpoints as a point of configuration. Until then, if it’s something you need (raw sockets will almost always want a level of protection by being internal), then a code solution will probably be the easiest way to go. By segmenting the endpoint resolution calls away from the actual service calls and making them part of the startup, it should keep the value-add code clean and separate from the foundation code. Thus, once a feature like this becomes available to configure, the services should continue to work while allowing you to disable the balancing code.

Joseph Fultz is an architect at the Microsoft Technology Center in Dallas, where he works with both enterprise customers and ISVs designing and prototyping software solutions to meet business and market demands. He’s spoken at events such as Tech·Ed and similar internal training events.

About the Author