Share via

October 2016

Volume 31 Number 10

[Cognitive Services]

Face and Emotion Recognition in Xamarin.Forms with Microsoft Cognitive Services

By Alessandro Del

At the Build 2016 conference, Microsoft announced a first preview of Cognitive Services  (, a rich set of cross-platform, RESTful APIs that you can leverage to create the next generation of apps based on natural user interaction for any platform on any device. Cognitive Services, also known as “Project Oxford,” are based on machine learning and perfectly fit into the conversation-as-a-platform philosophy that Microsoft is willing to bring into the apps ecosystem. At a higher level, the Cognitive Services APIs are available through RESTful services and currently offer the following categories of APIs:

  • Vision: The Vision services offer APIs that allow you to analyze images and videos to identify faces and emotions, and to detect actionable information. This category includes the Computer Vision, Face, Emotion and Video APIs.
  • Speech: The Speech services offer APIs that make it easier to implement text to speech, natural speech recognition, and even to recognize who’s talking with the speaker recognition service. They include the Bing Speech, Custom Recognition Intelligent Service and Speaker Recognition APIs.
  • Language: The Language services are oriented to natural language understanding, which means detecting and fixing spelling errors, understanding voice commands, and analyzing complex text including sentiments and key phrases. They include the Bing Spell Check, Language Understanding Intelligent Service, Linguistic Analysis, Text Analytics and Web Language Model APIs.
  • Knowledge: The Knowledge services help applications extend customers’ knowledge by finding personalized product recommendations, events, locations, and academic papers or journals. They include the Academic Knowledge, Entity Linking Intelligence Service, Knowledge Exploration Service and Recommendations APIs.
  • Search: The Search services are based on Bing and allow you to implement powerful search tools in their apps. The included services’ names are really self-explanatory: Bing Autosuggest, Bing Image Search, Bing News Search, Bing Video Search and Bing Web Search APIs.

In this article I’ll explain how to combine the Face and Emotion APIs to retrieve face details and emotions from pictures you can take from a camera or from an album on disk in a Xamarin.Forms app created with C# and Visual Studio 2015 running on Android, iOS or Windows 10. Figure 1 shows the results of the article’s tutorial. It is important mentioning that, while using Xamarin.Forms for this article, the same can be done with traditional Xamarin apps, as well as with any other platform that supports REST. I’m assuming you have basic knowledge of creating a Xamarin.Forms app and of the concepts about code sharing; if not, make sure you read my previous articles: “Build a Cross-Platform UX with Xamarin.Forms” ( and “Share UI Code Across Mobile Platforms with Xamarin.Forms” (

Face and Emotion Recognition on a Cross-Platform App with Xamarin.Forms
Figure 1 Face and Emotion Recognition on a Cross-Platform App with Xamarin.Forms (Android Device on Left, Windows 10 Desktop on Right)

Subscribing for Cognitive Services APIs

In order to build apps that take advantage of Cognitive Services, you must subscribe to the service in which you’re interested. At the moment, Microsoft is offering free trials that you can activate in the subscriptions page (, but the current plans may be subject to changes in the future. When on the page, register with a Microsoft account, then click “Request new trials.” You’ll then see a list of available services; make sure you select free previews of both the Face and Emotion APIs. At this point, your subscriptions page will show the list of active services; you should see the Face and Emotion APIs subscriptions. Figure 2 shows an example based on my subscriptions. Notice how, for each active service, there are two secret keys. You’ll need one to invoke the APIs. For now, keep them hidden. You’ll unhide the key when creating the Xamarin.Forms app.

Activating Subscriptions for Face and Emotion APIs
Figure 2 Activating Subscriptions for Face and Emotion APIs

Generally speaking, Cognitive Services provide RESTful APIs, which means you can interact with these services via HTTP requests on any platform and with any language supporting REST. For example, the following HTTP POST request demonstrates how to send an image to the emotion recognition service for emotion detection:

Content-Type: application/json
Content-Length: 107
Ocp-Apim-Subscription-Key: YOUR-KEY-GOES-HERE
{ "url": "" }

Of course, you must replace the Ocp-Apim-Subscription-Key with one of your own keys and the fake image URL with a real image address. In exchange, the Emotion recognition service will send back the result of detection as a JSON response, as shown in Figure 3.

Figure 3 The Emotion Recognition Service Detection Response

    "faceRectangle": {
      "height": 70,
      "left": 26,
      "top": 35,
      "width": 70
    "scores": {
      "anger": 2.012591E-11,
      "contempt": 1.95578984E-10,
      "disgust": 1.02281912E-10,
      "fear": 1.16242682E-13,
      "happiness": 1.0,
      "neutral": 9.79047E-09,
      "sadness": 2.91102975E-10,
      "surprise": 1.71011272E-09

The sample response in Figure 3 shows how the Emotion service returned the rectangle in which a face was detected and an array called scores containing a list of emotions and a value between 0 and 1 that indicates how likely the emotion is to be true. In general, sending HTTP requests to RESTful services and expecting a JSON response is a common approach with all of the Cognitive Services. However, for .NET developers working with C#, Microsoft is also offering client portable libraries that you can download from NuGet and that make it easier to interact with services in managed code and in a fully object-oriented way. This is the case of the Face and Emotion APIs, as you’ll see shortly. Don’t forget to check out the official documentation, which contains examples based on both the REST approach and on client libraries where available ( Now that you’ve registered for both services and you have your keys, it’s time to create a cross-platform app with Xamarin.Forms and Microsoft Visual Studio 2015.

Creating a Xamarin.Forms Application

As you know, you can create a cross-platform app with Xamarin.Forms by choosing either the Portable or the Shared project template. Because I’ll explain how to leverage client libraries for the Cognitive Services APIs, the sample application is based on the Portable Class Library (PCL) model. In Visual Studio 2015, select File | New Project. If you’ve installed the latest updates from Xamarin (, you’ll find a new project template called Blank Xaml App (Xamarin.Forms Portable) under the Visual C#, Cross-Platform node of the New Project dialog. This is an interesting template that provides a blank XAML page, and avoids the need to create one manually. Figure 4 shows the new template.

Call the solution FaceEmotionRecognition and click OK. During the generation of the solution, you’ll be asked to specify the minimum target version for the Universal Windows Platform (UWP) project. This is left to your choice, but I recommend targeting the highest version available.

Creating a New Xamarin.Forms Application
Figure 4 Creating a New Xamarin.Forms Application

Introducing Plug-ins for Xamarin

The sample application will use the Cognitive Services APIs to recognize face details and emotions from pictures, using existing pictures from the device or taking new pictures from the camera. This implies that the app will need to access the Internet to connect to the services and will need to provide the ability of taking and selecting pictures. While an app can easily connect to a network, it’s your responsibility, as the developer, to check for network availability. Actually, features like checking for network availability and taking pictures would require writing specific code in the Android, iOS and Windows projects. Luckily enough, Xamarin supports plug-ins that you can use in Xamarin.Forms and that you can install to the PCL project, so that they’ll do the job for you. A plug-in is a library installed from NuGet that wraps the native APIs into a common code implementation and invoked in the PCL project. There’s a large number of plug-ins—some developed and supported by Xamarin and others created and published by the developer community. Plug-ins are all open source and listed on GitHub at In this article I’ll show how to use the Connectivity and Media plug-ins.

Installing NuGet Packages

When the solution is ready, the first thing you need to do is install the following NuGet packages:

  • Microsoft.ProjectOxford.Face: Installs the client library for the Face APIs and must be installed to the PCL project only.
  • Microsoft.ProjectOxford.Emotion: Installs the client library for the Emotion APIs and, like for the Face API, must be installed to the PCL project only.
  • Xam.Plugin.Connectivity: Contains the Connectivity plug-in for Xamarin.Forms and must be installed to all the projects in the solution.
  • Xam.Plugin.Media: Contains the Media plug-in for Xamarin.Forms and, like the Connectivity API, must be installed to all the projects in the solution.

Once you’ve installed the required NuGet packages, make sure you build the solution before writing code so that all references will be refreshed.

Designing the UI

The sample application’s UI consists of a single page. For the sake of simplicity, I’ll use the auto-generated MainPage.xaml file. This page defines two buttons, one for taking a picture from the camera and one for uploading an existing image; an ActivityIndicator control that will show a busy status while waiting for a response from the service; an Image control that will display the selected image; a number of labels, within StackLayout panels, that are data-bound to a custom class that will contain the result of detections over the selected picture. Figure 5 shows the full XAML code for the page.

Figure 5 The UI for the Main Page

<?xml version="1.0" encoding="utf-8" ?>
<ContentPage xmlns=""
  <StackLayout Orientation="Vertical">
    <Button x:Name="TakePictureButton" Clicked="TakePictureButton_Clicked"
      Text="Take from camera"/>
    <Button x:Name="UploadPictureButton" Clicked="UploadPictureButton_Clicked"
      Text="Pick a photo"/>
    <ActivityIndicator x:Name="Indicator1" IsVisible="False" IsRunning="False" />
    <Image x:Name="Image1" HeightRequest="240" />
    <StackLayout Orientation="Horizontal" Padding="3">
      <Label Text="Gender: "/>
      <Label x:Name="GenderLabel" Text="{Binding Path=Gender}" />
    <StackLayout Orientation="Horizontal" Padding="3">
      <Label Text="Age: "/>
      <Label x:Name="AgeLabel" Text="{Binding Path=Age}"/>
    <StackLayout Orientation="Horizontal" Padding="3">
      <Label Text="Emotion: "/>
      <Label x:Name="EmotionLabel" Text="{Binding Path=Emotion}"/>
    <StackLayout Orientation="Horizontal" Padding="3">
      <Label Text="Smile: "/>
      <Label x:Name="SmileLabel"
        Text="{Binding Path=Smile}"/>
    <StackLayout Orientation="Horizontal" Padding="3">
      <Label Text="Glasses: "/>
      <Label x:Name="GlassesLabel" Text="{Binding Path=Glasses}"/>
    <StackLayout Orientation="Horizontal" Padding="3">
      <Label Text="Beard: "/>
      <Label x:Name="BeardLabel"
        Text="{Binding Path=Beard}"/>
    <StackLayout Orientation="Horizontal" Padding="3">
      <Label Text="Moustache: "/>
      <Label x:Name="MoustacheLabel"
        Text="{Binding Path=Moustache}"/>

The next step is to prepare a place to store the result of the face and emotion detection.

Storing Detection Results with a Class

Instead of manually populating labels in the UI with the results of face and emotion detection, it’s best practice to create a custom class. Not only is this a more object-oriented approach, but it also allows for data binding the class’ instance to the UI. That said, let’s create a new class called FaceEmotionDetection:

public class FaceEmotionDetection
  public string Emotion { get; set; }
  public double Smile { get; set; }
  public string Glasses { get; set; }
  public string Gender { get; set; }
  public double Age { get; set; }
  public double Beard { get; set; }
  public double Moustache { get; set; }

Each property has a self-explanatory name and will store infor­mation that comes from the combination of both the Face and Emotion APIs.

Declaring the Service Clients

Before you write any other code, it’s a good idea to add the following using directives:

using Microsoft.ProjectOxford.Emotion;
using Microsoft.ProjectOxford.Emotion.Contract;
using Microsoft.ProjectOxford.Face;
using Microsoft.ProjectOxford.Face.Contract;
using Plugin.Connectivity;
using Plugin.Media;

These will simplify the invocation to object names for both the Cognitive Services APIs and the plug-ins. The Face APIs and the Emotion APIs provide the Microsoft.ProjectOxford.Face.Face­ServiceClient and Microsoft.ProjectOxford.Emotion.Emotion­ServiceClient classes, which connect to the Cognitive Services and respectively return information about face and emotion details. What you first need to do is declare an instance of both, passing your secret key to the constructor, as shown here:

private readonly IFaceServiceClient faceServiceClient;
private readonly EmotionServiceClient emotionServiceClient;
public MainPage()
  // Provides access to the Face APIs
  this.faceServiceClient = new FaceServiceClient("YOUR-KEY-GOES-HERE");
  // Provides access to the Emotion APIs
  this.emotionServiceClient = new EmotionServiceClient("YOUR-KEY-GOES-HERE");

Notice that you must supply your own secret keys. Both the Face and Emotion API secret keys can be found in the subscriptions page of the Microsoft Cognitive Services portal (, as shown in Figure 2.

Capturing and Loading Images

In Xamarin.Forms, accessing both the camera and the file system would require writing platform-specific code. A simpler approach is using the Media plug-in for Xamarin.Forms, which lets you pick pictures and videos from disk and take pictures and videos with the camera from the PCL project, and with just a few lines of code. This plug-in exposes a class called CrossMedia, which exposes the following members:

  • Current: Returns a singleton instance of the CrossMedia class.
  • IsPickPhotoSupported  and  IsPickVideoSupported: Bool properties that return true if the current device supports selecting pictures and videos from disk.
  • PickPhotoAsync and PickVideoAsync: Methods that invoke the platform-specific UI to select a local picture or video, respectively, and return an object of type MediaFile.
  • IsCameraAvailable: A bool property that returns true if the device has a built-in camera.
  • IsTakePhotoSupported  and  IsTakeVideoSupported: Bool properties that return true if the current device supports taking pictures and videos from the camera.
  • TakePhotoAsync and TakeVideoAsync: Methods that launch the built-in camera to take a picture or video, respectively, and return an object of type MediaFile.

Do not forget to set the proper permissions in the app manifest to access the camera. For instance, in a UWP project you need both the Webcam and Pictures Library permissions, while on Android you need the CAMERA, READ_EXTERNAL_STORAGE, and WRITE_EXTERNAL_STORAGE permissions. Forgetting to set the required permissions will result in runtime exceptions. Now let’s write the Clicked event handler for the UploadPictureButton, which is shown in Figure 6.

Figure 6 Selecting a Picture from Disk

private async void UploadPictureButton_Clicked(object sender, EventArgs e)
  if (!CrossMedia.Current.IsPickPhotoSupported)
    await DisplayAlert("No upload", "Picking a photo is not supported.", "OK");
  var file = await CrossMedia.Current.PickPhotoAsync();
  if (file == null)
  this.Indicator1.IsVisible = true;
  this.Indicator1.IsRunning = true;
  Image1.Source = ImageSource.FromStream(() => file.GetStream());
  this.Indicator1.IsRunning = false;
  this.Indicator1.IsVisible = false;

The code first checks if selecting pictures is supported, showing an error message if IsPickPhotoSupported returns false. PickPhoto­Async (as well as PickVideoAsync) returns an object of type Media­File, which is a class defined in the Plugin.Media namespace and that represents the selected file. You must invoke its GetStream method to return a stream that can be used as the source for the Image control through its FromStream method. Taking a picture with the camera is also very easy, as shown in Figure 7.

Figure 7 Taking a Picture with the Camera

private async void TakePictureButton_Clicked(object sender, EventArgs e)
  await CrossMedia.Current.Initialize();
  if (!CrossMedia.Current.IsCameraAvailable || !CrossMedia.Current.
    await DisplayAlert("No Camera", "No camera available.", "OK");
  var file = await CrossMedia.Current.TakePhotoAsync(new StoreCameraMediaOptions
    SaveToAlbum = true,
    Name = "test.jpg"
  if (file == null)
  this.Indicator1.IsVisible = true;
  this.Indicator1.IsRunning = true;
  Image1.Source = ImageSource.FromStream(() => file.GetStream());
  this.Indicator1.IsRunning = false;
  this.Indicator1.IsVisible = false;

The point of interest here is that TakePhotoAsync takes a param­eter of type StoreCameraMediaOptions, an object that lets you specify where and how to save a picture. You can set the SaveToAlbum property as true if you want the picture to be saved to the local camera roll, or you can set the Directory property if you want to save to a different folder. As you can see, with very limited effort and with a few lines of code, your app can easily leverage an impor­tant capability of all the supported platforms.

Detecting Emotions and Implementing Face Recognition

Now it’s time to implement face and emotion recognition. Because this is an introductory article, I’ll focus on simplicity. I’ll show how to implement detection over one single face in a picture and I’ll describe the most important objects and members in the APIs. I’ll also give you suggestions on how to implement more detailed detections where appropriate. Based on these assumptions, let’s start writing an asynchronous method that performs detections. The first piece is about emotion detection and it looks like this:

private async Task<FaceEmotionDetection> DetectFaceAndEmotionsAsync(MediaFile inputFile)
    // Get emotions from the specified stream
    Emotion[] emotionResult = await
    // Assuming the picture has one face, retrieve emotions for the
    // first item in the returned array
    var faceEmotion = emotionResult[0]?.Scores.ToRankedList();

The method receives the MediaFile that’s produced by selecting or taking a picture. Detecting emotions over faces on a picture is straightforward, because you simply invoke the RecognizeAsync method from the EmotionServiceClient class’ instance. This method can receive either a stream or a URL as an argument. In this case, it gets a stream from the MediaFile object. RecognizeAsync returns an array of Emotion objects. Each Emotion in the array stores emotions detected on a single face in a picture. Assuming the selected picture has just one face, the code retrieves the first item in the array. The Emotion type exposes a property called Scores, which contains a list of eight emotion names and their approximate value. More specifically, you get an IEnumerable<string, float>. By invoking its ToRankedList method, you can get a sorted list of detected emotions. The APIs cannot detect a single emotion precisely. Instead, they detect a number of possible emotions. The highest value returned is approximately the actual emotion on the face, but there are still other values that could be checked.  The highest value in this list represents the emotion with the highest level of estimated likelihood, which is possibly the actual emotion on a face. For a better understanding, consider the following ranked list of emotions retrieved with the help of the debugger’s data tips, which is based on the sample picture shown in Figure 1:

[0] = {[Happiness, 1]}
[1] = {[Neutral, 1.089301E-09]}
[2] = {[Surprise, 7.085784E-10]}
[3] = {[Sadness, 9.352855E-11]}
[4] = {[Disgust, 4.52789E-11]}
[5] = {[Contempt, 1.431213E-11]}
[6] = {[Anger, 1.25112E-11]}
[7] = {[Fear, 5.629648E-14]}

As you can see, Happiness has a value of 1, which is the highest in the list and is the estimated likelihood of the actual emotion. The next step is detecting face attributes. The FaceServiceClient class exposes the DetectAsync method, which is extremely powerful. Not only can it retrieve face attributes such as gender, age, and smile, but it can also recognize people, return the face rectangle (the area on the picture where the face was detected), and 27 face landmark points that let an app identify information such as the position of nose, mouth, ears, and eyes on the picture. DetectAsync has the following signature:

Task<Contract.Face[]> DetectAsync(Stream imageStream,
  bool returnFaceId = true, bool returnFaceLandmarks = false,
  IEnumerable<FaceAttributeType> returnFaceAttributes = null);

In its most basic invocation, DetectAsync requires a stream pointing to a picture or a URL and returns the face rectangle, while the returnFaceId and returnFaceLandmarks optional parameters, respectively, let you identify a person and return face landmarks. The Face APIs let you create groups of people and assign an id to each person so that you can easily perform recognition. Face landmarks are instead useful to identify a face’s characteristics and will be available through the FaceLandmarks property of the Face object. Both identification and landmarks are beyond the scope of this article, but you can find more about these topics at and, respectively. Similarly, I won’t show you how to use face landmarks, but these are stored in the FaceLandmarks property of the Face object. In the current sample scenario, the goal is to retrieve face attributes. The first thing you need is an array of the FaceAttributeType enumeration, which defines the list of attributes you want to retrieve:

// Create a list of face attributes that the
// app will need to retrieve
var requiredFaceAttributes = new FaceAttributeType[] {

Next, invoke DetectAsync, passing the image stream and the face attributes list. The returnFaceId and returnFaceLandmarks arguments are false because the related information is unnecessary at this point. The method invocation looks like this:

// Get a list of faces in a picture
var faces = await faceServiceClient.DetectAsync(inputFile.GetStream(),
  false, false, requiredFaceAttributes);
// Assuming there is only one face, store its attributes
var faceAttributes = faces[0]?.FaceAttributes;

DetectAsync returns an array of Face objects, each representing a face in the picture. The code takes the first item in the array, which represents one single face, and retrieves its face attributes. Notice how the last line uses the null conditional operator (?), introduced with C# 6, that returns null if the first element in the array is also null, instead of throwing a NullReferenceException. More about this operator can be found at Now that you have both face and emotion information, you can create an instance of the FaceEmotionDetection class and populate its properties, as demonstrated in the following code:

FaceEmotionDetection faceEmotionDetection = new FaceEmotionDetection();
faceEmotionDetection.Age = faceAttributes.Age;
faceEmotionDetection.Emotion = faceEmotion.FirstOrDefault().Key;
faceEmotionDetection.Glasses = faceAttributes.Glasses.ToString();
faceEmotionDetection.Smile = faceAttributes.Smile;
faceEmotionDetection.Gender = faceAttributes.Gender;
faceEmotionDetection.Moustache = faceAttributes.FacialHair.Moustache;
faceEmotionDetection.Beard = faceAttributes.FacialHair.Beard;

A few considerations at this point:

  • The highest value in the emotions list is taken by invoking FirstOrDefault over the result of the invocation to the Scores.ToRankedList method, which returns an IEnumerable<string, float>.
  • The value returned by FirstOrDefault here is an object of type KeyValuePair<string, float> and the Key of type string stores the emotion name in a human-readable text that will be shown in the UI.
  • Glasses is an enumeration that specifies if the detected face is wearing glasses and what kind. The code invokes ToString for the sake of simplicity, but you could definitely implement a converter for different string formatting.

The final block in the method body returns the instance of the FaceEmotionDetection class and implements exception handling:

return faceEmotionDetection;
  catch (Exception ex)
    await DisplayAlert("Error", ex.Message, "OK");
    return null;

The last thing you have to do is invoke the custom DetectFaceAndEmotionAsync method. You can do this inside both Clicked event handlers, just before setting to false the IsRunning and IsVisible properties of the ActivityIndicator control:

FaceEmotionDetection theData = await DetectFaceAndEmotionsAsync(file);
this.BindingContext = theData;
this.Indicator1.IsRunning = false;
this.Indicator1.IsVisible = false;

The BindingContext property of the page receives an instance of the FaceEmotionDetection class as the data source and data-bound children controls will automatically show the related information. With patterns like Model-View-ViewModel, you would wrap the result with a ViewModel class. After a lot of work, you’re ready to test the application.

Testing the Application

Select the platform of your choice and press F5. If you use the Microsoft emulators, you can take advantage of the emulator tools to select a physical webcam to take pictures, and you can simulate an SD card to upload files. Figure 1 shows the result of the detection on a picture of me, on an Android device and on Windows 10 running in desktop mode.

The Face and Emotion APIs did an amazing job because the returned values are very close to the truth, though still approximate. It’s worth mentioning that the FaceEmotionDetection class has some properties of type double, such as Smile, Beard and Moustache. They return numeric values, which might not make much sense for the end user in a real-world app. So, in case you want to convert those numeric values into human-readable strings, you might consider implementing value converters and the IValueConverter interface (

Implementing Network Connectivity Check

A well-designed app that needs to access resources on the Internet should always check for connection availability first. As for accessing the camera and the file system, in Xamarin.Forms checking for connection availability should require platform-specific code. Fortunately, the Connectivity plug-in comes in to help, providing a shared way to perform this check from the PCL project directly. The plug-in offers a class called CrossConnectivity with its Current property that represents a singleton instance of the class. It exposes a bool property called IsConnected that simply returns true if a connection is available. To check for network availability in the sample application, simply place the following code after the declaration of the DetectFaceAndEmotionAsync method:

private async Task<FaceEmotionDetection>
  DetectFaceAndEmotionsAsync(MediaFile inputFile)
    await DisplayAlert("Network error",
      "Please check your network connection and retry.", "OK");
    return null;

The class also exposes the following interesting members:

  • ConnectivityChanged: An event that’s raised when the connection state changes. You can subscribe this event and get information on the connectivity status via an object of type ConnectivityChangedEventArgs.
  • BandWidths: A property that returns a list of available bandwidths for the current platform.

Additional information about the Connectivity plug-in can be found at

Wrapping Up

Microsoft Cognitive Services provide RESTful services and rich APIs based on machine learning that let you create the next generation of apps. By combining the power of these services with Xamarin, you’ll be able to bring natural user interaction to your cross-platform apps for Android, iOS and Windows, offering customers an amazing experience.

Alessandro Del Sole has been a Microsoft MVP since 2008. Awarded MVP of the Year five times, he has authored many books, eBooks, instructional videos and articles about .NET development with Visual Studio. Del Sole works as a solution developer expert for Brain-Sys (, focusing on .NET development, training and consulting. You can follow him on Twitter: @progalex.

Thanks to the following technical experts for reviewing this article: James McCaffrey and James Montemagno

Discuss this article in the MSDN Magazine forum