November 2016

Volume 31 Number 11

[Cognitive Services]

Seeing the World with Xamarin and Microsoft Computer Vision APIs

By Alessandro Del

In my last article, I provided a brief introduction to Microsoft Cognitive Services, describing the available RESTful APIs, and showcasing the Face and Emotion APIs in a cross-platform app written with Xamarin.Forms and C# (msdn.microsoft.com/magazine/mt742868). In this article, I’ll discuss another important set of APIs, known as Computer Vision. You’ll want to read the previous article before you go on with this one because I’m going to assume you’re familiar with some concepts about Cognitive Services I explained there, and because I’ll reuse some NuGet packages and code snippets from the previous sample app. With that said, let’s start by describing the Microsoft Computer Vision APIs.

Getting Started with the Computer Vision APIs

The Computer Vision APIs allow images to be described and analyzed using natural language. You can upload a picture to the Computer Vision service or point to an image URL, and expect a fully natural description back, without the need to construct and format descriptions on your own. And that’s not all. Computer Vision can perform Optical Character Recognition (OCR) over an image that contains text, and it can scan an image to detect faces of celebrities. As with other services, Computer Vision is based on machine learning and supports REST, which means you perform HTTP requests and get back a JSON response. The JSON in Figure 1 shows an excerpt from the response of Computer Vision analysis over a picture, taken from the official documentation at bit.ly/2a45kQI.

Figure 1 Result of Computer Vision Analysis of an Image

{
  "description": {
    "tags": [
      "person",
      "man",
      "outdoor",
      "window",
    ],
    "captions": [
      {
        "text": "Satya Nadella sitting in front of a building",
        "confidence": 0.38035155997373377
      }
    ]
  },
  "requestId": "ed2de1c6-fb55-4686-b0da-4da6e05d283f",
  "metadata": {
    "width": 1500,
    "height": 1000,
    "format": "Jpeg"
  }
}

As you can see, the response contains a natural language description of what a celebrity is doing in the picture, with a list of additional information such as tags and picture size and format. And this is just some of the information you can get from the Computer Vision service.

Performing HTTP requests certainly works, but as a .NET developer, you might prefer a different approach. As with the Face and Emotion APIs, Microsoft also offers a portable client library (PCL) for Computer Vision that you can use in your C# applications, including .NET and Xamarin, which lets you invoke the service through convenient methods using an object-oriented approach. I’ll use this library shortly.

Subscribing to the Computer Vision APIs

As with other Cognitive Services, to use the Computer Vision APIs you must register the service and get a secret key to use in your code. To accomplish this, simply go to the Subscriptions page (bit.ly/2b2rKDO) and request a new trial for the Computer Vision APIs. Figure 2 shows how your subscription appears after registration.

Registering for the Computer Vision APIs
Figure 2 Registering for the Computer Vision APIs

As you can see, you get two secret keys. You’ll need one later when it’s time to write C# code.

Creating a Xamarin.Forms App and Installing NuGet Packages

Launch Visual Studio 2015 and create a new Xamarin.Forms project using the Blank XAML App (Xamarin.Forms Portable) project template. Call the new project ComputerVisionSample and click OK. When ready, install the following NuGet packages:

Microsoft.ProjectOxford.Vision installs the client library for the Computer Vision APIs and must be installed to the PCL project only.

Xam.Plugin.Connectivity contains the Connectivity plug-in for Xamarin.Forms and must be installed to all the projects in the solution. It will be used to check for a network connection before attempting to make requests over the Internet.

Xam.Plugin.Mediacontains the Media plug-in for Xamarin.Forms and must be installed to all the projects in the solution. It will be used to take and select pictures from shared code, instead of having to write platform-specific code.

Make sure you build the solution at this point, so that all references will be refreshed. Now let’s dive into the Computer Vision APIs, analyzing the three key scenarios.

Describing Pictures with the Analysis APIs

The Computer Vision client library exposes a class called Microsoft.ProjectOxford.Vision.VisionServiceClient, which is the object you use to send requests to the service, and that exposes properties containing the analysis result. This class will be used in all the scenarios I target. The first scenario is describing pictures, by which I mean obtaining a description of what the picture represents, based on natural, human-readable language. The response the service sends back also contains information, such as dominant colors, faces detected, tags, image type and size, and also whether a picture contains adult or racy content. In order to describe a picture, the VisionServiceClient class exposes two methods: AnalyzeImageAsync and DescribeAsync. The latter provides a smaller set of information, and is typically used to retrieve only a natural language description of the picture, whereas Analyze­ImageAsync returns more detailed information. Both methods store their response into an object of type Microsoft.ProjectOxford.Vision.Contract.AnalysisResult. I’ll be using AnalyzeImageAsync in this article. This method has two overloads, one accepting a stream and one accepting a URL pointing to an image; both overloads require specifying the set of information you want to retrieve from the picture. This set of information is represented by an array of values from the Microsoft.Project­Oxford.Vision.VisualFeature enumeration. In the MainPage.xaml page of the sample application, I’ll implement the picture description. Figure 3 shows how to implement a detailed analysis method.

Figure 3 Analyzing a Picture for Description

private async Task<AnalysisResult> AnalyzePictureAsync(Stream inputFile)
{
  // Use the connectivity plug-in to detect
  // if a network connection is available
  // Remember using Plugin.Connectivity directive
  if (!CrossConnectivity.Current.IsConnected)
  {
    await DisplayAlert("Network error",
      "Please check your network connection and retry.", "OK");
    return null;
  }
  VisualFeature[] visualFeatures = new VisualFeature[] { VisualFeature.Adult,
    VisualFeature.Categories, VisualFeature.Color, VisualFeature.Description,
    VisualFeature.Faces, VisualFeature.ImageType, VisualFeature.Tags };
  AnalysisResult analysisResult =
    await visionClient.AnalyzeImageAsync(inputFile,
    visualFeatures);
  return analysisResult;           
}

Notice how the code uses the Connectivity plug-in to detect a network connection, as I explained in my previous article. There are three key points in Figure 3. The first point concerns the information you want to retrieve. The array of VisualFeature values contains the most detailed list of information possible, and includes values from the VisualFeature enumeration. These have self-explanatory names and include detection of adult and racy content, a list of categories for the picture, dominant colors, a natural language description, a list of faces, image information and a list of tags. I’ve included the Faces value for the sake of completeness, but I won’t actually be using this result because you can retrieve more detailed information about faces using the Face API. The second key point is the invocation to Analyze­ImageAsync, which sends a stream to the service together with the list of information you want to retrieve, and stores the response into an object of type AnalysisResult. This class is the third key point and is defined in Figure 4.

Figure 4 The AnalysisResult Class Definition

namespace Microsoft.ProjectOxford.Vision.Contract
{
  public class AnalysisResult
  {
    public AnalysisResult();
    public Adult Adult { get; set; }
    public Category[] Categories { get; set; }
    public Color Color { get; set; }
    public Description Description { get; set; }
    public Face[] Faces { get; set; }
    public ImageType ImageType { get; set; }
    public Metadata Metadata { get; set; }
    public Guid RequestId { get; set; }
    public Tag[] Tags { get; set; }
  }
}

To see this definition, right-click the AnalysisResult type in the code editor and select Go To Definition (or Peek Definition if you don’t want to leave the active window). As you can see, the definition exposes properties that contain the required information through specialized objects. By using Go To Definition on each property type, you can understand how each specialized object is defined and, therefore, how you can use it in your app (with data binding, for example). For instance, the Description type is defined as follows:

public class Description
{
  public Description();
  public Caption[] Captions { get; set; }
  public string[] Tags { get; set; }
}

Here, the most important property is Captions, an array of Caption objects. Each Caption contains a human-readable description the service retrieved from the picture, which is offered through its Text property. The Adult class is defined as follows:

public class Adult
 {
   public Adult();
   public double AdultScore { get; set; }
   public bool IsAdultContent { get; set; }
   public bool IsRacyContent { get; set; }
   public double RacyScore { get; set; }
 }

This is a simpler class and exposes two bool properties that return true if the picture contains adult or racy content, plus two other properties that represent the confidence for that result. This is particularly useful when you want to restrict content availability. Now take a look at the definition of the Color class:

public class Color
{
  public Color();
  public string AccentColor { get; set; }
  public string DominantColorBackground { get; set; }
  public string DominantColorForeground { get; set; }
  public string[] DominantColors { get; set; }
  public bool IsBWImg { get; set; }
}

This class is used to store a list of colors detected in the picture, such as the accent color, dominant foreground and background colors, and an array of dominant colors. It also exposes a property called IsBWImg, of type bool, which returns true if the picture is black and white. Understanding how these objects are defined and the properties they expose will help when you want to present information in the UI via data binding. I’ll leave it to you to explore the definition of the other classes that AnalysisResult uses to store the analysis information. As it is, the AnalysisResult instance can be data-bound to some UI elements to show information very easily, and I’ll get to this shortly. Consider Figure 5, which shows the full listing for the XAML required to define the UI for the sample app.

Figure 5 The UI Definition for Image Description

<?xml version="1.0" encoding="utf-8" ?>
<ContentPage xmlns="https://xamarin.com/schemas/2014/forms"
             xmlns:x="https://schemas.microsoft.com/winfx/2009/xaml"
             xmlns:local="clr-namespace:ComputerVisionSample"
             x:Class="ComputerVisionSample.MainPage">
  <StackLayout Orientation="Vertical">
    <Button x:Name="TakePictureButton" Clicked="TakePictureButton_Clicked"
      Text="Take from camera"/>
    <Button x:Name="UploadPictureButton" Clicked="UploadPictureButton_Clicked"
      Text="Pick a photo"/>
    <ActivityIndicator x:Name="Indicator1" IsVisible="False" IsRunning="False" />
    <Image x:Name="Image1" HeightRequest="240" />
  <ScrollView Padding="10">
    <StackLayout>
      <StackLayout Orientation="Horizontal">
        <Label Text="Adult content: "/>
        <Label Text="{Binding Adult.IsAdultContent}"/>
      </StackLayout>
      <StackLayout Orientation="Horizontal">
        <Label Text="Racy content: "/>
        <Label Text="{Binding Adult.IsRacyContent}"/>
      </StackLayout>
      <StackLayout Orientation="Horizontal">
        <Label Text="Description: "/>
        <Label Text="{Binding Description.Captions[0].Text}"/>
      </StackLayout>
      <StackLayout Orientation="Horizontal">
        <Label Text="Accent color: "/>
        <Label Text="{Binding Color.AccentColor}"/>
      </StackLayout>
      <StackLayout Orientation="Horizontal">
        <Label Text="Tags: "/>
        <ListView ItemsSource="{Binding Tags}">
          <ListView.ItemTemplate>
            <DataTemplate>
              <ViewCell>
                <Label Text="{Binding Name}"/>
              </ViewCell>
            </DataTemplate>
          </ListView.ItemTemplate>
        </ListView>
      </StackLayout>      
    </StackLayout>
  </ScrollView>
</ContentPage>

UI defines two buttons, one for selecting a picture from the device and one for taking a picture from the camera, plus an ActivityIndicator that shows that an operation is in progress. The selected picture is displayed within an Image control. (I used these controls in the previous article, too.) Notice how data binding is expressed within Label controls. For example, you can bind the Text property directly to Adult.IsAdultContent and Adult.IsRacyContent from the AnalysisResult instance instead of performing complex bindings. Similarly, you can retrieve the natural language description of a picture by binding directly to the Text property of the first Caption object in the collection of Captions exposed by the AnalysisResult.Description property. Of course, this is fine if there’s just one caption or if you just want to see the first result. However, Captions might contain multiple Caption objects and, in that case, you might want to choose a different data binding. The same direct binding is made against the Color.AccentColor property. For Tags, the UI shows the list of tags with a ListView control, with a data template that presents a Label for each tag name. In the codebehind, you must first implement two event handlers for the buttons, as shown in Figure 6. The code uses the Media plug-in I already described in my previous article, so I won’t cover it here.

Figure 6 Clicked Event Handlers for the Buttons

private async void TakePictureButton_Clicked(object sender, EventArgs e)
{
  await CrossMedia.Current.Initialize();
  if (!CrossMedia.Current.IsCameraAvailable || !CrossMedia.Current.IsTakePhotoSupported)
  {
    await DisplayAlert("No Camera", "No camera available.", "OK");
    return;
  }
  var file = await CrossMedia.Current.TakePhotoAsync(new StoreCameraMediaOptions
  {
    SaveToAlbum = true,
    Name = "test.jpg"
  });
  if (file == null)
      return;
  this.Indicator1.IsVisible = true;
  this.Indicator1.IsRunning = true;
  Image1.Source = ImageSource.FromStream(() => file.GetStream());
  var analysisResult = await AnalyzePictureAsync(file.GetStream());
  this.BindingContext = analysisResult;
  this.Indicator1.IsRunning = false;
  this.Indicator1.IsVisible = false;
}
private async void UploadPictureButton_Clicked(object sender, EventArgs e)
{
  if (!CrossMedia.Current.IsPickPhotoSupported)
  {
    await DisplayAlert("No upload", "Picking a photo is not supported.", "OK");
    return;
  }
  var file = await CrossMedia.Current.PickPhotoAsync();
  if (file == null)
      return;
  this.Indicator1.IsVisible = true;
  this.Indicator1.IsRunning = true;
  Image1.Source = ImageSource.FromStream(() => file.GetStream());
  var analysisResult = await AnalyzePictureAsync(file.GetStream());
  this.BindingContext = analysisResult;
  this.Indicator1.IsRunning = false;
  this.Indicator1.IsVisible = false;
}

The key point here is the invocation of the AnalyzePictureAsync method, whose result (an instance of AnalysisResult) is assigned to the page as its data source. This enables data binding of the UI elements seen in Figure 6. Then you need to declare and instantiate the VisionServiceClient class as follows:

private readonly VisionServiceClient visionClient;
public MainPage()
{
  InitializeComponent();
  this.visionClient =
    new VisionServiceClient("YOUR-KEY-GOES-HERE");
}

Notice that you need to supply one of the secret keys you got when registering for the Computer Vision APIs. Finally, remember to add the proper permissions in the app’s manifests. For instance, the Universal Windows Platform (UWP) project requires the Internet, Webcam and Pictures Library capabilities; and the Android project requires the INTERNET, CAMERA, READ_EXTERNAL_STORAGE and WRITE_EXTERNAL_STORAGE permissions. Now you can start the application on your favorite device or emulator. Figure 7 shows the UWP version running in desktop mode, with an image and the requested information.

Describing a Picture with the Computer Vision APIs
Figure 7 Describing a Picture with the Computer Vision APIs

Among all of the available information, you’ll probably be most impressed by the content of the Description property of the AnalysisResult class, which provides an auto-generated, human-readable description with no effort.

Retrieving Text from Pictures with OCR

OCR is the electronic conversion of an image of text into editable text. Most scanners ship with OCR software that lets you produce editable documents from images containing text, such as magazine pages. The set of Computer Vision APIs offers an OCR service that can retrieve text from within images, no matter the language of the text. OCR basically results in string objects. To understand how the OCR APIs work, let’s add a new XAML page to the PCL project. In Solution Explorer, right-click the ComputerVision­Sample (Portable) project, select Add | New Item, and then in the Add New Item dialog, select the Forms Xaml Page item available in the Cross-Platform node. Call the new window OcrRecognition­Page. The VisionServiceClient class exposes a method called RecognizeTextAsync, which performs OCR on an image. This method accepts either a stream or a URL pointing to an image, and you can optionally specify the language. If you don’t specify a language, RecognizeTextAsync will attempt to automatically detect the language. It returns an object of type Microsoft.Project­Oxford.Vision.Contract.OcrResults, which is a bit complex and deserves more explanation. For now, consider the following code, which invokes the OCR service over a stream and auto-detects the language:

private async Task<OcrResults> AnalyzePictureAsync(Stream inputFile)
{
  if (!CrossConnectivity.Current.IsConnected)
  {
    await DisplayAlert("Network error",
      "Please check your network connection and retry.", "OK");
    return null;
  }
  OcrResults ocrResult = await visionClient.RecognizeTextAsync(inputFile);
  return ocrResult;
}

Notice how you always check for network connectivity first. If you want to specify a language, you pass to RecognizeTextAsync an instance of the RecognizeLanguage class, as follows:

OcrResults ocrResult =
  await visionClient.RecognizeTextAsync(inputFile,
  new RecognizeLanguage(){ ShortCode = "it", LongName = "Italian"  }

The official sample application for WPF at bit.ly/2ahHum3 shows the full list of supported languages and codes. The OcrResults class is defined as follows:

public class OcrResults
{
  public OcrResults();
  public string Language { get; set; }
  public string Orientation { get; set; }
  public Region[] Regions { get; set; }
  public double? TextAngle { get; set; }
}

The Language, Orientation and TextAngle properties represent the detected language, the orientation and angle of the recognized text. Regions is an array of Region objects. Each Region represents the areas on the image that contain text, and the Region type has a property called Lines, an array of Line objects, each representing a single line of text in the region. Each Line object has a property called Words, an array of Word objects, each representing a single word in the line. This is a slightly complex hierarchy, but it provides extremely accurate results in that you can work with every single word that the API detects. Don’t forget to use Go To Definition to investigate each class’s definition. Because of this complexity, some parts of the UI will be generated at run time. For now, in the XAML for the new page, add the code shown in Figure 8, which declares some familiar controls (two buttons, an Image, an ActivityIndicator) and a StackLayout that will receive the list of lines and words detected. Notice how the code also adds a Label control to display the detected language.

Figure 8 Preparing the UI for Optical Character Recognition

<?xml version="1.0" encoding="utf-8" ?>
<ContentPage xmlns="https://xamarin.com/schemas/2014/forms"
             xmlns:x="https://schemas.microsoft.com/winfx/2009/xaml"
             x:Class="ComputerVisionSample.OcrRecognitionPage">
  <StackLayout Orientation="Vertical">
    <Button x:Name="TakePictureButton" Clicked="TakePictureButton_Clicked"
      Text="Take from camera"/>
    <Button x:Name="UploadPictureButton" Clicked="UploadPictureButton_Clicked"
      Text="Pick a photo"/>
    <ActivityIndicator x:Name="Indicator1" IsVisible="False" IsRunning="False" />
    <Image x:Name="Image1" HeightRequest="240" />
    <StackLayout Orientation="Horizontal">
      <Label Text="Language: "/>
      <Label Text="{Binding Language}"/>
    </StackLayout>
    <ScrollView>
      <StackLayout x:Name="DetectedText">
      </StackLayout>
    </ScrollView>
  </StackLayout>
</ContentPage>

As I mentioned, some parts of the UI will be constructed at run time. More specifically, I need to iterate the Regions array, then its nested Lines array, to detect every single word in the Words array. This is demonstrated in Figure 9, where a StackLayout is generated for each line.

Figure 9 Iterating Text Regions and Lines

private void PopulateUIWithRegions(OcrResults ocrResult)
{
  // Iterate the regions
  foreach (var region in ocrResult.Regions)
  {
    // Iterate lines per region
    foreach (var line in region.Lines)
    {
      // For each line, add a panel
      // to present words horizontally
      var lineStack = new StackLayout
      { Orientation = StackOrientation.Horizontal };
      // Iterate words per line and add the word
      // to the StackLayout
      foreach (var word in line.Words)
      {
        var textLabel = new Label { Text = word.Text };
        lineStack.Children.Add(textLabel);
      }
      // Add the StackLayout to the UI
      this.DetectedText.Children.Add(lineStack);
    }
  }
}

The remaining code is very simple. First, declare and instantiate the VisionServiceClient class as follows:

private readonly VisionServiceClient visionClient;
public OcrRecognitionPage()
{
  InitializeComponent();
  this.visionClient =
    new VisionServiceClient("YOUR-KEY-GOES-HERE");
}

You can certainly use the same key you used previously. Next, you can reuse both event handlers shown in Figure 6, where you have to replace the following lines:

var analysisResult = await AnalyzePictureAsync(
  file.GetStream());
this.BindingContext = analysisResult;

with the following new lines:

var ocrResult = await AnalyzePictureAsync(
  file.GetStream());
this.BindingContext = ocrResult;
PopulateUIWithRegions(ocrResult);

By doing so, you can bind the OcrResults instance to the UI and then the PopulateUIWithRegions method will generate new lines with the detected text. For the sake of simplicity, instead of implementing page navigation, you can simply change the startup page in the App.xaml.cs constructor as follows:

MainPage = new OcrRecognitionPage();

Now start the application again, by choosing your favorite emulator or device. If you select or take a picture, you’ll be able to read the text that’s printed on it, as shown in Figure 10.

Performing Optical Character Recognition on an Image
Figure 10 Performing Optical Character Recognition on an Image

In this case, the sample app is running on an Android emulator. Notice how the language has been properly detected as English (en). It’s very important to note that the OCR service works well with high-quality images. If the image resolution is poor, the image is blurred, or it contains handwritten or cursive text, the service might return an inaccurate result. It’s also worth mentioning that the OCR service can detect words even on images with a multi-color background, not just a solid color. For instance, you can analyze text over the picture of a sunset.

Finding Celebrities with a Domain-Specific Model

In the first part of the article, I explained what the Computer Vision APIs offer to describe an image. Description is something that happens at a very high level, and returns general information from an image. Microsoft is also working on offering specialized recognition via the so-called domain-specific models. These allow the return of very specific information from an image, which can be combined with an image description. As of this writing, the only domain-specific model available is celebrity recognition. By using this model, you can take advantage of the Computer Vision APIs to detect celebrities in a picture. Generally speaking, with a model you can perform specialized analysis over specific categories of images. So my next and final example is recognizing celebrities within pictures. I can’t show celebrity pictures for copyright reasons, but you won’t have any problem testing the code. Let’s start by adding a new XAML page to the PCL project, called CelebrityRecognitionPage. Refer to the previous section for the steps required to add a page. The UI for this page is very simple: It just needs to display the celebrity name in a Label and, of course, it will offer the usual UI elements, as shown in Figure 11.

Figure 11 Preparing the UI for Celebrity Recognition

<?xml version="1.0" encoding="utf-8" ?>
<ContentPage xmlns="https://xamarin.com/schemas/2014/forms"
             xmlns:x="https://schemas.microsoft.com/winfx/2009/xaml"
             x:Class="ComputerVisionSample.CelebrityRecognitionPage">
  <StackLayout Orientation="Vertical">
    <Button x:Name="TakePictureButton" Clicked="TakePictureButton_Clicked"
      Text="Take from camera"/>
    <Button x:Name="UploadPictureButton" Clicked="UploadPictureButton_Clicked"
      Text="Pick a photo"/>
    <ActivityIndicator x:Name="Indicator1" IsVisible="False" IsRunning="False" />
    <Image x:Name="Image1" HeightRequest="240" />
    <Label x:Name="CelebrityName"/>
  </StackLayout>
</ContentPage>

Celebrity recognition is performed using a method from the VisionServiceClient class called AnalyzeImageInDomainAsync. This method requires using the image stream or URL and the domain-specific model for detection. You can retrieve the list of available models by invoking the VisionService­Client.ListModelsAsync; though, as I mentioned, only the celebrity recognition model is available at the moment. The following code demonstrates how to retrieve the list of models and one specific model:

private async Task<Model> GetDomainModel()
{
  ModelResult modelResult = await visionClient.ListModelsAsync();
  // At this writing, only celebrity recognition
  // is available. It is the first model in the list
  return modelResult.Models.First();
}

The next step is the code that performs recognition, as in the following custom method called AnalyzePictureAsync:

private async Task<AnalysisInDomainResult> AnalyzePictureAsync(Stream inputFile)
{
  if (!CrossConnectivity.Current.IsConnected)
  {
    await DisplayAlert("Network error",
      "Please check your network connection and retry.", "OK");
    return null;
  }
  AnalysisInDomainResult analysisResult =
    await visionClient.AnalyzeImageInDomainAsync(inputFile, await GetDomainModel());
  return analysisResult;
}

The result of image analysis is an object of type AnalysisIn­DomainResult with the following definition:

public class AnalysisInDomainResult
{
  public AnalysisInDomainResult();
  public Metadata Metadata { get; set; }
  public Guid RequestId { get; set; }
  public object Result { get; set; }
}

The Result property contains the actual result of recognition. As you can see, it’s of type System.Object, which means it contains raw data. More specifically, Result stores the JSON response returned by the Computer Vision service. Depending on the number of celebrities detected, this JSON can be very complex, and this is the reason it’s an object instead of a more specialized type. It basically defines an array of items, each containing the celebrity name, the face rectangle size, and a value representing the accuracy of the result. For instance, if the result contains one celebrity, the JSON will look similar to the following (where CelebrityName stands for the real celebrity name):

{"celebrities": [
  {
    "name": "CelebrityName",
    "faceRectangle": {
      "left": 169,
      "top": 148,
      "width": 186,
      "height": 186
    },
    "confidence": 0.9064959
  }
]}

If the JSON contains multiple celebrities, you can imagine how complex it can be. So an important problem to solve here is retrieving the celebrity name from an Object type. This can be done by using the popular Newtonsoft.Json library, which is a dependency of the Microsoft.ProjectOxford.Vision library and, therefore, is already available in the PCL project. The library provides an object called JObject, from the Netwonsoft.Json.Linq namespace, which allows parsing JSON markup stored inside a System.Object with a method called Parse. You can then treat the result as a JSON string and retrieve the desired element with an index. The following method demonstrates how to retrieve a celebrity name from the analysis result:

private string ParseCelebrityName(object analysisResult)
{
  JObject parsedJSONresult = JObject.Parse(analysisResult.ToString());
  var celebrities = from celebrity in parsedJSONresult["celebrities"]
                    select (string)celebrity["name"];
  return celebrities.FirstOrDefault();
}

In this case, I’m assuming a picture contains only one celebrity, so the code invokes FirstOrDefault over the result of the LINQ query, but you can work with the query result to see how many celebrities have been detected. The next step is declaring and instantiating the VisionServiceClient class, again with the secret key:

private readonly VisionServiceClient visionClient;
public CelebrityRecognitionPage()
{
  InitializeComponent();
  this.visionClient = new VisionServiceClient("YOUR-KEY-GOES-HERE");
}

At this point, you can add the two-event handler for the buttons Clicked event. You can still reuse the code in Figure 6, just replacing the following line:

this.BindingContext = analysisResult;

with the following line:

this.CelebrityName.Text = ParseCelebrityName(analysisResult.Result);

You can now test the application by selecting a picture of your favorite celebrity and see how the Computer Vision APIs return the exact result.

Wrapping Up

The Computer Vision APIs open up an incredible number of new opportunities, and they provide a really simple way to describe the world, using your app on any platform and on any device. At the Build 2016 conference, Microsoft presented the Seeing AI project, based on several Cognitive Services, including Computer Vision, which is showcased in a short video that gives you a realistic perception of what you can do. Watch it at bit.ly/1qk5ZkJ.


Alessandro Del Sole has been a Microsoft MVP since 2008. Awarded MVP of the Year five times, he has authored many books, eBooks, instructional videos and articles about .NET development with Visual Studio. Del Sole is internationally considered a Visual Studio expert, Windows Presentation Foundation and Visual Basic authority, plus he works as a solution developer expert for Brain-Sys (www.brain-sys.it), focusing on .NET development, training and consulting. You can follow him on Twitter: @progalex.

Thanks to the following Microsoft technical expert for reviewing this article: James McCaffrey
Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Internet Explorer and Bing. Dr. McCaffrey can be reached at jammc@microsoft.com.


Discuss this article in the MSDN Magazine forum