August 2012

Volume 27 Number 08

Microsoft Azure - CyberNanny: Remote Access via Distributed Components

By Angel Hernandez | August 2012

This article is about an application called CyberNanny, which I recently wrote to allow me to remotely see my baby daughter Miranda at home from anywhere at any time. It’s written in Visual C++ (MFC) and it comprises different technologies such as Kinect and its SDK, Azure, Web services and Office automation via Outlook. The project is hosted on CodePlex (cybernanny.codeplex.com), where you can check out the code or contribute to it.

Before I get into the nuts and bolts of the application, I’ll briefly explain the technologies used to build it.

C++ has been—and still is—the workhorse in many software shops. Saying that, the new standard C++ 11 takes the language to a new level. Three terms to describe it would be modern, elegant and extremely fast. Also, MFC is still around and Microsoft has been upgrading it with every new release of its Visual C++ compiler.

The Kinect technology is amazing, to say the least; it changes the way we interact with games and computers. And with Microsoft providing developers with an SDK, a new world of opportunities is unveiled for creating software that requires human interaction. Interestingly, though, the Kinect SDK is based on COM (as well as the new programming model in Windows 8, called Windows Runtime, often abbreviated as WinRT). The SDK is also available to Microsoft .NET Framework languages.

Azure is the Microsoft Platform as a Service (PaaS) offering that has been around for a couple of years. It provides a series of services that allow building solutions on top of them (such as Compute and Storage). One of the requirements I had with CyberNanny was the reliable delivery of messages via a highly available queue, and Azure provides that.

The native use and consumption of Web services is possible using the Windows Web Services API (WWSAPI), which was introduced with Windows 7. I have a blog post (bit.ly/LiygQY) that describes a Windows Presentation Foundation (WPF) application implementing a native component using WWSAPI. It’s important to mention that WWSAPI is built in to the OS, so there’s no need to download or install anything but the Windows SDK (for header and lib files).

Why reinvent the wheel? One of the requirements for CyberNanny was the ability to send e-mails with attached pictures, so instead of writing my own e-mailing class, I preferred to reuse the functionality provided by Outlook for this task. This allowed me to focus on the main objective: building a distributed application for looking after my baby.

This article is organized in four main sections:

  1. Overview of the general architectural solution
  2. Kinect architecture
  3. Locally deployed components (native)
  4. Cloud-hosted components (managed)

Overview of the General Architectural Solution

The CyberNanny concept is simple (as shown in Figure 1), but it also has some moving pieces. It can briefly be described as a thick client written in Visual C++, which captures frames via the Kinect sensor. These frames can later be used as a picture that’s attached to a new e-mail composed in Outlook through automation. The application is notified about pending requests by spawning a thread triggered from a timer, which polls a queue hosted in Azure. The requests are inserted into the queue via an ASP.NET Web page.

CyberNanny Architecture
Figure 1 CyberNanny Architecture

Note that in order to run and test the solution you must have:

  • Kinect sensor (I used the one on my Xbox 360)
  • Azure subscription
  • Kinect SDK

Kinect Architecture

Having a good architectural understanding of how things work and how they can be implemented is crucial to development projects, and in this case Kinect is no exception. Microsoft has provided an SDK for managed and native code developers. I’ll describe the architecture Kinect is built upon, as shown in Figure 2.

Kinect for Windows Architecture
Figure 2 Kinect for Windows Architecture

The circled numbers in Figure 2 correspond to the following:

  1. Kinect hardware: The hardware components, including the Kinect and the USB hub through which the sensor is connected to the computer.
  2. Kinect drivers: The Windows drivers for the Kinect, which are installed as part of the SDK setup process as described in this article. The Kinect drivers support:
    • The Kinect microphone array as a kernel-mode audio device that you can access through the standard audio APIs in Windows.
    • Audio and video streaming controls for streaming audio and video (color, depth and skeleton).
    • Device enumeration functions that enable an application to use more than one Kinect.
  3. Audio and video components: The Kinect Natural User Interface (NUI) for skeleton tracking, audio, color and depth imaging.
  4. DirectX Media Object (DMO): This is for microphone array beam forming and audio source localization.
  5. Windows 7 standard APIs: The audio, speech and media APIs in Windows 7, as described in the Windows 7 SDK and the Microsoft Speech SDK.

I’ll demonstrate how I used the video component for capturing frames that are then saved as JPEG files for e-mailing purposes. The rendering of the captured frames is done via Direct2D.

The Nui_Core Class I wrote a class called Nui_Core, which encapsulates the functionality I needed from the Kinect sensor. There’s a single instance of this object in the application. The application interacts with the sensor via a member of type INuiSensor that represents the physical device connected to the computer. It’s important to remember that the Kinect SDK is COM-based, hence the aforementioned interface—as well as all the other COM interfaces used throughout the application—is managed by smart pointers (for example, CComPtr<INuiSensor> m_pSensor;).

The steps to start capturing frames with the sensor are:

  1. Check whether there’s a sensor available by calling NuiGetSensorCount.
  2. Create an instance of the Kinect sensor by calling NuiCreateSensorByIndex.
  3. Create a factory object for the creation of Direct2D resources by calling D2D1CreateFactory.
  4. Create events for each stream required by the application.
  5. Open the streams by calling NuiImageStreamOpen.
  6. Process the captured data (frame).

Once the Nui_Core instance is set up, you can easily take a picture on demand by calling the TakePicture method, as shown in Figure 3.

Figure 3 The TakePicture Method

void Nui_Core::TakePicture(std::shared_ptr<BYTE>& imageBytes, int& bytesCount) {
  byte *bytes;
  NUI_IMAGE_FRAME imageFrame;
  NUI_LOCKED_RECT LockedRect;
  if (SUCCEEDED(m_pSensor->NuiImageStreamGetNextFrame(m_hVideoStream,
    m_millisecondsToWait, &imageFrame))) {
    auto pTexture = imageFrame.pFrameTexture;
    pTexture->LockRect(0, &LockedRect, NULL, 0);
    if (LockedRect.Pitch != 0) {
      bytes = static_cast<BYTE *>(LockedRect.pBits);
      m_pDrawColor->Draw(bytes, LockedRect.size);
    }
    pTexture->UnlockRect(0);
    imageBytes.reset(new BYTE[LockedRect.size]);
    memcpy(imageBytes.get(), bytes, LockedRect.size);
    bytesCount = LockedRect.size;
    m_pSensor->NuiImageStreamReleaseFrame(m_hVideoStream, &imageFrame);
  }
}

Note that you pass a smart pointer to store the bytes of the image as well as the number of bytes that are copied to it, and then this information is used to handcraft your bitmap.

It’s important to mention that once you’ve finished using the sensor, it has to be shut down by calling NuiShutdown, and handles that were used need to be released.

The DrawDevice Class As previously mentioned, the rendering capabilities are provided by Direct2D; that’s why another support class is required for use in conjunction with Nui_Core. This class is responsible for ensuring there are resources available for the captured frame, such as a bitmap in this case.

The three main methods are Initialize, Draw and EnsureResources. I’ll describe each.

Initialize: This is responsible for setting up three members of type DrawDevice. The application has a tab control with three tabs, so there’s a member for each tab (Color, Skeletal and Depth view). Each tab is a window that’s responsible for rendering its corresponding frame. The InitializeColorView shown in the following code is a good example of calling the Initialize method:

bool Nui_Core::InitializeColorView() {
  auto width = m_rect.Width();
  auto height = m_rect.Height();
  m_pDrawColor = std::shared_ptr<DrawDevice>(new DrawDevice());
  return (m_pDrawColor.get()->Initialize(m_views[TAB_VIEW_1]->m_hWnd,
  m_pD2DFactory.p, 640, 320, NULL));
}

Draw: This renders a frame on the proper tab. It takes as argument a Byte* captured by the sensor. Just as in the movies, the effect of animation comes from the successive rendering of static frames.

EnsureResources: This is responsible for creating a bitmap when requested by the Draw method.

Locally Deployed Components (Native)

The CyberNanny project comprises the following:

  • Application
    • CCyberNannyApp (inherited from CWinApp). The application has a single member of type Nui_Core for interacting with the sensor.
  • UI Elements
    • CCyberNannyDlg (Main Window, inherited from CDialogEx)
    • CAboutDlg (About Dialog, inherited from CDialogEx)
  • Web Service Client
    • Files auto-generated after executing WSUTIL against a service, Web Services Description Language (WSDL). These files contain the messages, structures and methods exposed by the WCF Web service.
  • Outlook Object Classes
    • In order to manipulate some of the Outlook objects, you have to import them into your project by selecting “Add MFC Class” from ActiveX Control Wizard. The objects used in this solution are Application, Attachment, Mail-Item and Namespace.
  • Proxy
    • This is a custom class that encapsulates the creation of the required objects to interact with WWSAPI.
  • Helper Classes
    • These classes are used to support the functionality of the application, such as converting a bitmap into a JPEG to reduce the file size, providing a wrapper to send e-mails and interact with Outlook, and so on.

When the application starts, the following events occur:

  1. A new window message is defined by calling Register-WindowMessage. This is for adding items to the list of events when a request is processed. This is required because you can’t directly modify UI elements from a thread different from the UI thread, or you’ll incur an illegal cross-thread call. This is managed by the MFC messaging infrastructure.
  2. You initialize your Nui_Core member and set up a couple of timers (one for updating the current time on the status bar and another one that kicks off a thread for polling the queue to check whether there’s a pending request).
  3. The Kinect sensor starts capturing frames, but the application doesn’t take a picture unless there’s a request in the queue. The ProcessRequest method is responsible for taking a picture, serializing the picture to disk, writing to the event viewer and kicking off the Outlook automation, as shown in Figure 4.

Figure 4 The ProcessRequest Method Call

void CCyberNannyDlg::ProcessRequest(_request request) {
  if (!request.IsEmpty) {
    auto byteCount = 0;
    ImageFile imageFile;
    std::shared_ptr<BYTE> bytes;
    m_Kinect.TakePicture(bytes, byteCount);
    imageFile.SerializeImage(bytes, byteCount);
    EventLogHelper::LogRequest(request);
    m_emailer.ComposeAndSend(request.EmailRecipient,
    imageFile.ImageFilePath_get());
    imageFile.DeleteFile();
  }
}

The frame originally captured by Kinect is a bitmap that’s approximately 1.7MB in size (which isn’t convenient for e-mailing and therefore needs to be converted to a JPEG image). It’s also upside down, so a 180° rotation is required. This is done by making a couple of calls to GDI+. This functionality is encapsulated in the ImageFile class.

The ImageFile class serves as a wrapper for performing operations with GDI+. The two main methods are:

  1. SerializeImage: This method takes a shared_ptr<BYTE>, which contains the bytes of the captured frame to be serialized as an image, as well as the count of bytes. The image is also rotated by calling the RotateFlip method.
  2. GetEncoderClsid: As mentioned, the image file size is too big to use as an attachment—therefore, it needs to be encoded to a format with a smaller footprint (JPEG, for example). GDI+ provides a GetImageEncoders function that lets you find out which encoders are available on the system.

So far I’ve covered how the application utilizes the Kinect sensor and how the frames captured are used to create a picture for e-mailing. Next, I’ll show you how to call the WCF service hosted on Azure.

WWSAPI, introduced in Windows 7, allows native developers to consume Web or WCF services in an easy and convenient way, without worrying about the communication (sockets) details. The first step for consuming a service is to have a WSDL to use with WSUTIL that in turn produces codegen C code for service proxies, which are data structures required by the service. There is an alternative called Casablanca (bit.ly/JLletJ), which supports cloud-based client-server communication in native code, but it wasn’t available when I wrote CyberNanny.

It’s common to get the WSDL and save it to disk, and then use the WSDL file and related schema files as input for WSUTIL. One aspect to take into account is schemas. They must be downloaded along with the WSDL, otherwise WSUTIL will complain when producing the files. You can easily determine the required schemas by checking the .xsd parameter in the schema section of the WSDL file: 

wsutil /wsdl:cybernanny.wsdl /xsd:cybernanny0.xsd cybernanny1.xsd cybernanny2.xsd cybernanny3.xsd /string:WS_STRING

The resulting files can be added to the solution, and then you proceed to call your service via the codegen files. Four main objects are required to use with WWSAPI:

  1. WS_HEAP
  2. WS_ERROR
  3. WS_SERVICE_PROXY
  4. WS_CHANNEL_PROPERTY

These objects allow the interaction between the client and the service. I put together the functionality to invoke the service in the Proxy class.

Most of the WWSAPI functions return an HRESULT, so debugging errors can be a challenging task. But fear not, because you can enable the tracing from the Windows Event Viewer and see exactly why a given function failed. To enable tracing, navigate to Applications and Services Logs | Microsoft | WebServices | Tracing (right-click it to enable it).

That pretty much covers the native components of the solution. For more information, please refer to the source code on the aforementioned CodePlex site. The next section is about the Azure component of the solution.

Cloud-Hosted Components (Managed)

Please note that this is not an extensive tutorial on Azure, but rather a description of the Azure components in CyberNanny. For more in-depth and detailed information, refer to the Azure Web site at windowsazure.com. The Azure platform (Figure 5) comprises the following services:

  • Azure Compute
  • Azure Storage
  • Azure SQL Database
  • Azure AppFabric
  • Azure Marketplace
  • Azure Virtual Network

Azure Platform Services
Figure 5 Azure Platform Services

CyberNanny only has a Web Role that has allocated two cores to guarantee high availability. If one of the nodes fails, the platform will switch to the healthy node. The Web Role is an ASP.NET application, and it only inserts message items into a queue. These messages are then popped out from CyberNanny. There’s also a WCF service, which is part of the Web Role that’s responsible for handling the queue.

Note that an Azure role is an individual component running in the cloud where each instance of a cloud corresponds to a virtual machine (VM) instance. In CyberNanny’s case, then, I’ve allocated two VMs.

CyberNanny has a Web Role that’s a Web application (whether it’s only ASPX pages or WCF services) running on IIS. It’s accessible via HTTP/HTTPS endpoints. There’s also another type of role that’s called a Worker Role. It’s a background processing application (for example, for financial calculations), and it also has the ability to expose Internet-facing and internal endpoints.

This application also utilizes a queue provided by Azure Storage, which allows reliable storage and delivery of messages. The beauty of the queue is that you don’t have to write any specialized code to take advantage of it. Neither are you responsible for setting up the data storage with a certain structure to resemble a queue, because all this functionality is provided out of the box by the platform.

Besides high availability and scalability, one of the benefits provided by the Azure platform is the commonality to do things such as developing, testing and deploying Azure solutions from Visual Studio, as well as having .NET as the lingua franca to build solutions.

There are some other cool features I’d love to add to CyberNanny, such as motion detection and speech recognition. If you want to use this software or contribute to the project, please feel free to do so. The technologies used are available now and even though they look “different,” they can interoperate and play nicely with one another.   

Happy coding!


Angel Hernandez Matos is a manager in the Enterprise Applications team at Avanade Australia. He’s based in Sydney, Australia, but is originally from Caracas, Venezuela. He has been a Microsoft MVP award recipient for eight consecutive years and is currently an MVP in Visual C++. He has been writing software since he was 12 years old and considers himself an “existential geek.”

Thanks to the following technical experts for reviewing this article: Scott Berry, Diego Dagum, Yonghwi Kwon and Nish Sivakumar