October 2010

Volume 25 Number 10

Forecast: Cloudy - Performance-Based Scaling in Microsoft Azure

By Joseph Fultz | October 2010

Without a doubt, cloud computing is gaining lots of mindshare, and its practical use is building momentum across technology platforms and throughout the industry. Cloud computing isn’t a new or revolutionary concept; indeed, it has been around for years in the form of shared hosting and other such services. Now, however, advances in technology and years of experience running servers and services have made cloud computing not only technically practical, but increasingly interesting to both consumers and providers.

Progress in cloud computing will reach beyond IT and touch every part of your company—from the people managing hardware and services, to the developers and architects, to the executives who will approve the budget and pay the bills. You’d better be prepared for it.

In this column I’ll focus primarily on the developers and architects who need to understand and leverage cloud computing in their work. I’ll supply some guidance on how to accomplish a given task, including notes on architecture considerations and their impact on cost and performance. Please tell me what you think of the topics I cover and, even more importantly, about topics that are of particular interest in cloud computing.

Seeding the Cloud

One of the first benefits people focus on in cloud computing is the idea that application owners don’t have to worry about infrastructure setup, configuration or maintenance. Let’s be honest: that’s pretty compelling.

However, I think it’s more important to focus on the ability to scale up or down to serve the needs of the application owner, thereby creating a more efficient cost model without sacrificing performance or wasting resources. In my experience, demand elasticity is something that comes up in any conversation about the cloud, regardless of the platform being discussed.

In this installment I’ll demonstrate how to use performance counter data from running roles to automate the process of shrinking or growing the number of instances of a particular Web Role. To do this, I’ll take a look at a broad cross-section of Azure features and functionality, including Azure Compute, Azure Storage and the REST Management API.

The concept is quite simple: test collected performance data against a threshold and then scale the number of instances up or down accordingly. I won’t go into detail about collecting diagnostic data—I’ll leave that to you or to a future installment. Instead, I’ll examine performance counter data that has been dumped to a table in Azure Storage, as well as the code and setup required to execute the REST call to change the instance count in the configuration. Moreover, the downloadable code sample will contain a simple page that will make REST management calls to force the instance count to change based on user input. The scenario is something like the drawing in Figure 1.

image: Performance-Based Scaling

Figure 1 Performance-Based Scaling

Project Setup

To get things started, I created an Azure Cloud Service project that contains one Worker Role and one Web Role. I configured the Web Role to publish performance counter data, specifically % Processor Time, from the role and push it to storage every 20 seconds. The code to get that going lives inside of the WebRole::OnStart method and looks something like this:

var performanceConfiguration = 
  new PerformanceCounterConfiguration();
performanceConfiguration.CounterSpecifier = 
  @"\Processor(_Total)\% Processor Time";
performanceConfiguration.SampleRate = 
  System.TimeSpan.FromSeconds(1.0);
            
// Add the new performance counter to the configuration 
config.PerformanceCounters.DataSources.Add(
  performanceConfiguration);
config.PerformanceCounters.ScheduledTransferPeriod = 
  System.TimeSpan.FromSeconds(20.0);

This code registers the performance counter, sets the collection interval for data and then pushes the data to storage. The values I used for intervals work well for this sample, but are not representative of values I’d use in a production system. In a production system, the collection interval would be much longer as I’d be concerned with 24/7 operations. Also, the interval to push to storage would be longer in order to reduce the number of transactions against Azure Storage.

Next I create a self-signed certificate that I can use to make the Azure REST Management API calls. Every request will have to be authenticated and the certificate is the means to accomplish this. I followed the instructions for creating a self-signed certificate in the TechNet Library article “Create a Self-Signed Server Certificate in IIS 7” (technet.microsoft.com/library/cc753127(WS.10)). I exported both a .cer file and a .pfx file. The .cer file will be used to sign the requests I send to the management API and the .pfx file will be imported into the compute role via the management interface (see Figure 2).

image: Importing Certificates

Figure 2 Importing Certificates

I’ll come back later and grab the thumbprint to put it in the settings of both the Web Roles and Worker Roles that I’m creating so they can access the certificate store and retrieve the certificate.

Finally, to get this working in Azure, I need a compute project where I can publish the two roles and a storage project to which I can transfer the performance data. With these elements in place, I can move on to the meat of the work.

Is It Running Hot or Cold?

Now that I’ve got the Web Role configured and code added to publish the performance counter data, the next step is to fetch that data and compare it to a threshold value. I’ll create a TestPerfData method in which I retrieve the data from the table and test the values. I’ll write a LINQ statement similar to the following:

double AvgCPU = (
  from d in selectedData
  where d.CounterName == 
    @"\Processor(_Total)\% Processor Time"
  select d.CounterValue).Average();

By comparing the average utilization, I can determine the current application performance. If the instances are running too hot, I can add instances. If they’re running cold and I’m wasting resources—meaning money—by having running instances I don’t need, I can reduce the number of instances.

You’ll find in-depth coverage of the code and setup needed to access the performance counter table data in a blog post I wrote at blogs.msdn.com/b/joseph_fultz/archive/2010/06/30/querying-azure-perf-counter-data-with-linq.aspx. I use a simple if-then-else block to assess the state and determine the desired action. I’ll cover the details after I’ve created the functions needed to change the running service configuration.

Using the REST Management API

Before I can finish the TestPerfData method, I have a little more work to do. I need a few methods to help me discover the number of instances of a given role, create a new valid service configuration for that role with an adjusted instance count, and, finally, allow me to update the configuration.

To this end I’ve added a class file to my project and created the six static methods shown in Figure 3.

Figure 3 Configuration Methods

Method Description
GetDeploymentInfo Retrieves the deployment configuration, including the encoded service configuration.
GetServiceConfig Retrieves and decodes the service configuration from the deployment info.
GetInstanceCount Fetches the instance count for a specified role.
ChangeInstanceCount Updates the service configuration and returns the complete XML.
ChangeConfigFile Updates the service configuration with the service configuration provided to the function.
LookupCertificate Passes in the environment setting containing the thumbprint and retrieves the certificate from the certificate store.

The calls that interact with the REST Management API must include a certificate. To accomplish this, the certificate is added to the hosted service and the thumbprint is added to the role configuration and used to fetch the certificate at run time. Once the service and role are configured properly, I use the following code to grab the certificate from the Certificate Store:

string Thumbprint = 
  RoleEnvironment.GetConfigurationSettingValue(
  ThumbprintSettingName);
X509Store certificateStore = 
  new X509Store(StoreName.My, StoreLocation.LocalMachine);
certificateStore.Open(OpenFlags.ReadOnly);
X509Certificate2Collection certs = 
  certificateStore.Certificates.Find(
  X509FindType.FindByThumbprint, Thumbprint, false);

This is the main code of the LookUpCertificate method and it’s called in the methods where I want to interact with the REST API. I’ll review the GetDeploymentInfo function as an example of how calls are constructed. For this example, I’ve hardcoded some of the variables needed to access the REST API:

string x_ms_version = "2009-10-01";
string SubscriptionID = "[your subscription ID]";
string ServiceName = "[your service name]";
string DeploymentSlot = "Production";

I need to create a HttpWebRequest with the proper URI, set the request headers and add my certificate to it. Here I build the URI string and create a new HttpWebRequest object using it:

string RequestUri = "https://management.core.windows.net/" + 
  SubscriptionID + "/services/hostedservices/"+ 
  ServiceName + "/deploymentslots/" + DeploymentSlot;
HttpWebRequest RestRequest = 
  (HttpWebRequest)HttpWebRequest.Create(RequestUri);

For the call to be valid, it must include the version in the header. Thus, I create a name-value collection, add the version key and data, and add that to the request headers collection:

NameValueCollection RequestHeaders = 
  new NameValueCollection();
RequestHeaders.Add("x-ms-version", x_ms_version);
if (RequestHeaders != null) {
  RestRequest.Headers.Add(RequestHeaders);
}

The last thing to do to prepare this particular request is to add the certificate to the request:

X509Certificate cert = LookupCertificate("RESTMgmtCert");
RestRequest.ClientCertificates.Add(cert);

Finally, I execute the request and read the response:

RestResponse = RestRequest.GetResponse();
using (StreamReader RestResponseStream = new StreamReader(RestResponse.GetResponseStream(), true)) {
  ResponseBody = RestResponseStream.ReadToEnd();
  RestResponseStream.Close();
}

That’s the general pattern I used to construct requests made to the REST Management API. The GetServiceConfig function extracts the Service Configuration out of the deployment configuration, using LINQ to XML statements like the following:

XElement DeploymentInfo = XElement.Parse(DeploymentInfoXML);
string EncodedServiceConfig = 
  (from element in DeploymentInfo.Elements()
where element.Name.LocalName.Trim().ToLower() == "configuration"
select (string) element.Value).Single();

In my code, the return of the GetServiceConfig is passed on to the GetInstanceCount or ChangeInstance count functions (or both) to extract the information or update it. The return from the ChangeInstance function is an updated Service Configuration, which is passed to ChangeConfigFile. In turn, ChangeConfigFile pushes the update to the service by constructing a request similar to the previous one used to fetch the deployment information, with these important differences:

  1. “/?comp=config” is added to the end of the URI
  2. The PUT verb is used instead of GET
  3. The updated configuration is streamed as the request body

Putting It All Together

With the functions in place to look up and change the service configuration, and having done the other preparatory work such as setting up counters, configuring the connection string settings for Storage, and installing certificates, it’s time to implement the CPU threshold test.

The Visual Studio template produces a Worker Role that wakes up every 10 seconds to execute code. To keep things simple, I’m leaving that but adding a single timer that will run every five minutes. In the timer, a simple conditional statement tests whether utilization is higher or lower than 85 percent, and I’ll create two instances of the Web Role. By doing this I guarantee that the number of instances will definitely decrease from the initial two instances to a single instance.

Inside the Worker Role I have a Run method that declares and instantiates the timer. Inside of the timer-elapsed handler I add a call to the TestPerfData function I created earlier. For this sample, I’m skipping the implementation of the greater-than condition because I know that the CPU utilization will not be that high. I set the less-than condition to be less than 85 percent as I’m sure the counter average will be lower than that. Setting these contrived conditions will allow me to see the change via the Web management console or via Server Explorer in Visual Studio.

In the less-than-85-percent block I check the instance count, modify the service configuration and update the running service configuration, as shown in Figure 4.

Figure 4 The Less-Than-85-Percent Block

else if (AvgCPU < 85.0) {
  Trace.TraceInformation("in the AvgCPU < 25 test.");
  string deploymentInfo = 
    AzureRESTMgmtHelper.GetDeploymentInfo();
  string svcconfig = 
    AzureRESTMgmtHelper.GetServiceConfig(deploymentInfo);
  int InstanceCount = 
    System.Convert.ToInt32(
    AzureRESTMgmtHelper.GetInstanceCount(
    svcconfig, "WebRole1"));
  if (InstanceCount > 1) {
    InstanceCount--;
    string UpdatedSvcConfig = 
      AzureRESTMgmtHelper.ChangeInstanceCount(
      svcconfig, "WebRole1", InstanceCount.ToString());
    AzureRESTMgmtHelper.ChangeConfigFile(UpdatedSvcConfig);
  }
}

I make sure to check the instance count before adjusting down, because I don’t want it to go to zero, as this is not a valid configuration and would fail.

Running the Sample

I’m now ready to execute the example and demonstrate elasticity in Azure. Knowing that my code is always right the first time—ahem—I right-click on the Cloud Service Project and click Publish. The dialog gives you the option to configure your credentials, which I’ve already done (see Figure 5).

image: Publishing the Project

Figure 5 Publishing the Project

I click OK and just have to wait for the package to be copied up and deployed. When deployment is complete, I switch to the Web management console and see two Web Roles and one Worker Role running, as shown in Figure 6.

image: Two Web Roles and One Worker Role

Figure 6 Two Web Roles and One Worker Role

I wait for the timer event to fire, executing the code that will determine that the average CPU utilization is less than 85 percent, and decrement the WebRole1 instance count. Once this happens, the management page will refresh to reflect an update to the deployment.

Because I’m using small VMs, changing the count by only one, and the application is lightweight (one .aspx page), the update doesn’t take long and I see the final, auto-shrunk deployment as shown in Figure 7.

image: Now One Web Role and One Worker Role

Figure 7 Now One Web Role and One Worker Role

Blue Skies

I want to share few final thoughts about the sample in the context of considering a real implementation. There are a few important points to think about.

First, the test is trivial and contrived. In a real implementation you’d need to evaluate more than simple CPU utilization and you’ll need to take into account the quantum over which the collection occurred.

In addition, you need to evaluate the costs of using Azure Storage. Depending on the solution, it might be advisable to scrub the records in the table for only ones that are of interest. You can decrease the upload interval to lower transaction costs, or you may want to move the data to SQL Azure to minimize that cost.

You also need to consider what happens during an update. A direct update will cause users to lose connectivity. It may be better to bring the new instances up in staging and then switch the virtual IP address. In either case, however, you’ll have session and viewstate problems. A better solution is to go stateless and disable the test during scale adjustments.

That’s it for my implementation of elasticity in Azure. Download the code sample and start playing with it today.


Joseph Fultz  is an architect at the Microsoft Technology Center in Dallas where he works with both Enterprise Customers and ISVs designing and prototyping software solutions to meet business and market demands. He’s spoken at events such as Tech•Ed and similar internal training events.

Thanks to the following technical expert for reviewing this article: Suraj Puri