Connecting DJI Drones with Azure Cognitive Services

Article
1/17/2024

1. Introduction

Microsoft Cognitive Services are designed to democratize AI by packaging it into discrete components that are easy for developers to use in their own projects. Combined with Cognitive Services，we can do more with DJI drones by analysis of the image and video. In this tutorial, we will show how to connect DJI drones with Azure Cognitive Services. The full project information can be found here: https://www.hackster.io/JiongShi/azure-cloud-services-for-dji-drones-d8d3a3. You can download the sample project from my Github page: https://github.com/shijiong/AzureCloudService4DJI. We use DJI Mavic Air as an example to demonstrate this demo.

2. Create a Basic UWP Application

Please refer to this tutorial to create a UWP application: Connecting DJI Drones with Azure IoT Hub. We can connect the UWP application with DJI drone by Windows SDK and get the real-time on-board sensor data, i.e., 3-axes velocities. In the rest of this article, we will demonstrate how to capture the image and upload it to cognitive services to get the analysis of the computer vision.

3. Create Computer Vision Resources

Follow this tutorial to create computer vision resources, including key and endpoint. You will need the key and endpoint from the resource you create to connect your application to the Computer Vision service. You'll paste your key and endpoint into the code below later in the tutorial.

4. Add Screen Capture Capability

The real time video from the aircraft can be displayed on the SwapChainPanel. However, snapshot is not supported by SwapChainPanel. Hence, we resort to the Windows.Graphics.Capture to acquire frames from application window. This official document will guide you to implement screen capture in your application: Screen capture. The APIs found in the Windows.Graphics.Capture namespace require a general capability to be declared in your application's manifest:

Open Package.appxmanifest in the Solution Explorer.
Select the Capabilities tab.
Check Graphics Capture.

It is shown in Fig. 1 as follows.

Fig.1 Add Windows.Graphics.Capture Capbility

5. Import Win2D APIs

In this demo, we convert each Direct3D11CaptureFrame into a CanvasBitmap, which is part of the Win2D APIs. Once we have the CanvasBitmap, we can save it as an image file. In order to use Win2D APIs, we can import the NuGet package directly. Search Win2D.uwp in the Browse bar and install it as shown in Fig. 2.

Fig. 2 Add Win2D.uwp NuGet Package

6. Add Microsoft Cognitive Services APIs

To facilitate the process of Microsoft Cognitive Services, we can import the required client library, which provides access to the Microsoft Cognitive Services APIs. In this demo, we make use of Face APIs, which can be installed by NuGet Manager as shown in Fig. 3.

Fig. 3 Add Microsoft Cognitive Services NuGet Package

Note that the Face API is still in prerelease state, remember to check the “Include prerelease” option.

7. Add Image and Canvas Control to the UI

It is quite straightforward to display the image and the results of analysis on the screen once the computer vision returns the JSON data. Hence, we add an Image control to display the captured image and a Canvas control to overlap the recognition results on the Image. Add the following code to the Mainpage.xaml.

01.<RelativePanel Name="spMain" Grid.Row="2">
02.    <Image Name="imgPhoto"
03.           RelativePanel.AlignLeftWithPanel="True"
04.           RelativePanel.AlignRightWithPanel="True"
05.           RelativePanel.AlignTopWithPanel="True"
06.           RelativePanel.AlignBottomWithPanel="True"
07.           Stretch="Uniform">
08.    </Image>
09.    <Canvas Name="cvasMain" RelativePanel.AlignLeftWithPanel="True"
10.            RelativePanel.AlignRightWithPanel="True"
11.            RelativePanel.AlignTopWithPanel="True"
12.            RelativePanel.AlignBottomWithPanel="True"
13.            >
14.        <Popup Name="emotionData" IsOpen="False" IsLightDismissEnabled="True">
15. 
16.        </Popup>
17.    </Canvas>
18.</RelativePanel>

8. Take a screenshot

Once user clicks the “Screenshot” button, the application will start the GraphicsCaptureItem choosing process. After user finish choosing, we will convert the Direct3D11CaptureFrame into a CanvasBitmap, which will be saved as an image file. And furthermore, the image file can be used to upload to the Azure Cognitive Service. The ScreenshotButton_ClickAsync method is defined as follows.

01.private async void ScreenshotButton_ClickAsync(object sender, RoutedEventArgs e)
02.{
03.    var picker = new GraphicsCapturePicker();
04.    GraphicsCaptureItem item = await picker.PickSingleItemAsync();
05.    if (item != null)
06.    {
07.        _item = item;
08.        _lastSize = _item.Size;
09. 
10.        if (_lastSize.Height != 0)
11.        {
12.            _framePool = Direct3D11CaptureFramePool.Create(
13.               _canvasDevice, // D3D device
14.               DirectXPixelFormat.B8G8R8A8UIntNormalized, // Pixel format
15.               2, // Number of frames
16.               _item.Size); // Size of the buffers
17. 
18.            _session = _framePool.CreateCaptureSession(_item);
19.            _session.StartCapture();
20.            Thread.Sleep(200);
21.            var frame = _framePool.TryGetNextFrame();
22.            if (frame != null)
23.            {
24.                // Convert our D3D11 surface into a Win2D object.
25.                CanvasBitmap canvasBitmap = CanvasBitmap.CreateFromDirect3D11Surface(
26.                    _canvasDevice,
27.                    frame.Surface);
28. 
29.                _currentFrame = canvasBitmap;
30.                await SaveImageAsync(_screenshotFilename, _currentFrame);
31.            }
32.            _session?.Dispose();
33.            _framePool?.Dispose();
34.            _item = null;
35.            _session = null;
36.            _framePool = null;
37.        }
38.    }
39.     
40.}

9. Save Image and Display the Results

When the application gets the frame, we an save it to the local storage and it is also used to upload to the cognitive service to get the description of the image. To implement this, we can define the SaveImageAsync as follows.

01.private async Task SaveImageAsync(string filename, CanvasBitmap frame)
02.{
03.     StorageFolder pictureFolder = KnownFolders.SavedPictures;
04. 
05.     StorageFile file = await pictureFolder.CreateFileAsync(
06.         filename,
07.         CreationCollisionOption.ReplaceExisting);
08. 
09.     using (var fileStream = await file.OpenAsync(FileAccessMode.ReadWrite))
10.     {
11.         await frame.SaveAsync(fileStream, CanvasBitmapFileFormat.Png, 1f);
12.     }
13. 
14.     var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read);
15.     var stream_send = stream.CloneStream();
16.     var stream_send2 = stream.CloneStream();
17.     var image = new  BitmapImage();
18.     image.SetSource(stream);
19.     imgPhoto.Source = image;
20.     size_image = new  Size(image.PixelWidth, image.PixelHeight);
21. 
22.     ringLoading.IsActive = true;
23. 
24.     //Face service
25.     FaceClient f_client = new  FaceClient(
26.         new ApiKeyServiceClientCredentials(key_face),
27.         new System.Net.Http.DelegatingHandler[] { });  // need to provide and endpoint and a delegate. key_face, face_apiroot);
28.     f_client.Endpoint = face_apiroot;
29. 
30.     var requiedFaceAttributes = new  FaceAttributeType[] {
31.                         FaceAttributeType.Age,
32.                         FaceAttributeType.Gender,
33.                         FaceAttributeType.Smile,
34.                         FaceAttributeType.FacialHair,
35.                         FaceAttributeType.HeadPose,
36.                         FaceAttributeType.Emotion,
37.                         FaceAttributeType.Glasses
38.                         };
39.     var faces_task = await f_client.Face.DetectWithStreamAsync(stream_send.AsStream(), true, true, requiedFaceAttributes);
40. 
41.     faces = faces_task.ToArray();
42. 
43.     if (faces != null)
44.     {
45.         DisplayEmotionsData(faces);
46.         DisplayFacesData(faces);               
47.     }
48. 
49.     //hide preview
50.     if (swapChainPanel.Visibility == Visibility.Collapsed)
51.     {
52.         swapChainPanel.Visibility = Visibility.Visible;
53.         spMain.Visibility = Visibility.Collapsed;
54.         ShowPreviewButton.Content = "Hide Preview";
55.     }
56.     else
57.     {
58.         swapChainPanel.Visibility = Visibility.Collapsed;
59.         spMain.Visibility = Visibility.Visible;
60.         ShowPreviewButton.Content = "Show Preview";
61.     }
62. 
63.     ringLoading.IsActive = false;
64. }

Here, DisplayEmotionsData is used to display the emotion results in the TextBlock as follows.

001.private void  DisplayEmotionsData(DetectedFace[] faces, bool init = true)
002. {
003.     if (faces == null)
004.         return;
005.     if (!init)
006.         return;
007. 
008.     var list_child = gridEmotions.Children.ToList();
009.     list_child.ForEach((e) =>
010.     {
011.         if (e as TextBlock != null && (e as TextBlock).Tag != null)
012.         {
013.             gridEmotions.Children.Remove(e);
014.         }
015.     });
016. 
017.     int index = 1;
018.     foreach (var face in faces)
019.     {
020.         TextBlock txt0 = new  TextBlock();
021.         txt0.Padding = new  Thickness(1);
022.         txt0.FontSize = 11;
023.         txt0.Text = "#" + index;
024.         Grid.SetRow(txt0, index + 1);
025.         Grid.SetColumn(txt0, 0);
026.         txt0.Tag = true;
027. 
028.         TextBlock txt1 = new  TextBlock();
029.         txt1.Padding = new  Thickness(1);
030.         txt1.FontSize = 11;
031.         txt1.Text = Math.Round(face.FaceAttributes.Emotion.Anger, 2).ToString();
032.         Grid.SetRow(txt1, index + 1);
033.         Grid.SetColumn(txt1, 1);
034.         txt1.Tag = true;
035. 
036.         TextBlock txt2 = new  TextBlock();
037.         txt2.Padding = new  Thickness(1);
038.         txt2.FontSize = 11;
039.         txt2.Text = Math.Round(face.FaceAttributes.Emotion.Contempt, 2).ToString();
040.         Grid.SetRow(txt2, index + 1);
041.         Grid.SetColumn(txt2, 2);
042.         txt2.Tag = true;
043. 
044.         TextBlock txt3 = new  TextBlock();
045.         txt3.Padding = new  Thickness(1);
046.         txt3.FontSize = 11;
047.         txt3.Text = Math.Round(face.FaceAttributes.Emotion.Disgust, 2).ToString();
048.         Grid.SetRow(txt3, index + 1);
049.         Grid.SetColumn(txt3, 3);
050.         txt3.Tag = true;
051. 
052.         TextBlock txt4 = new  TextBlock();
053.         txt4.Padding = new  Thickness(1);
054.         txt4.FontSize = 11;
055.         txt4.Text = Math.Round(face.FaceAttributes.Emotion.Fear, 2).ToString();
056.         Grid.SetRow(txt4, index + 1);
057.         Grid.SetColumn(txt4, 4);
058.         txt4.Tag = true;
059. 
060.         TextBlock txt5 = new  TextBlock();
061.         txt5.Padding = new  Thickness(1);
062.         txt5.FontSize = 11;
063.         txt5.Text = Math.Round(face.FaceAttributes.Emotion.Happiness, 2).ToString();
064.         Grid.SetRow(txt5, index + 1);
065.         Grid.SetColumn(txt5, 5);
066.         txt5.Tag = true;
067. 
068.         TextBlock txt6 = new  TextBlock();
069.         txt6.Padding = new  Thickness(1);
070.         txt6.FontSize = 11;
071.         txt6.Text = Math.Round(face.FaceAttributes.Emotion.Neutral, 2).ToString();
072.         Grid.SetRow(txt6, index + 1);
073.         Grid.SetColumn(txt6, 6);
074.         txt6.Tag = true;
075. 
076.         TextBlock txt7 = new  TextBlock();
077.         txt7.Padding = new  Thickness(1);
078.         txt7.FontSize = 11;
079.         txt7.Text = Math.Round(face.FaceAttributes.Emotion.Sadness, 2).ToString();
080.         Grid.SetRow(txt7, index + 1);
081.         Grid.SetColumn(txt7, 7);
082.         txt7.Tag = true;
083. 
084.         TextBlock txt8 = new  TextBlock();
085.         txt8.Padding = new  Thickness(1);
086.         txt8.FontSize = 11;
087.         txt8.Text = Math.Round(face.FaceAttributes.Emotion.Surprise, 2).ToString();
088.         Grid.SetRow(txt8, index + 1);
089.         Grid.SetColumn(txt8, 8);
090.         txt8.Tag = true;
091. 
092.         index++;
093.         gridEmotions.Children.Add(txt0);
094.         gridEmotions.Children.Add(txt1);
095.         gridEmotions.Children.Add(txt2);
096.         gridEmotions.Children.Add(txt3);
097.         gridEmotions.Children.Add(txt4);
098.         gridEmotions.Children.Add(txt5);
099.         gridEmotions.Children.Add(txt6);
100.         gridEmotions.Children.Add(txt7);
101.         gridEmotions.Children.Add(txt8);
102.     }
103. }

And we use DisplayFacesData to overlap the orange rectangle on the image, where the faces are detected.

01.private void  DisplayFacesData(DetectedFace[] faces, bool init = true)
02.{
03.     if (faces == null)
04.         return;
05. 
06.     cvasMain.Children.Clear();
07.     var offset_h = 0.0; var offset_w = 0.0;
08.     var p = 0.0;
09.     var d = cvasMain.ActualHeight / cvasMain.ActualWidth;
10.     var d2 = size_image.Height / size_image.Width;
11.     if (d < d2)
12.     {
13.         offset_h = 0;
14.         offset_w = (cvasMain.ActualWidth - cvasMain.ActualHeight / d2) / 2;
15.         p = cvasMain.ActualHeight / size_image.Height;
16.     }
17.     else
18.     {
19.         offset_w = 0;
20.         offset_h = (cvasMain.ActualHeight - cvasMain.ActualWidth / d2) / 2;
21.         p = cvasMain.ActualWidth / size_image.Width;
22.     }
23.     if (faces != null)
24.     {
25.         int count = 1;
26.         foreach (var face in faces)
27.         {
28.             Windows.UI.Xaml.Shapes.Rectangle rect = new  Windows.UI.Xaml.Shapes.Rectangle();
29.             rect.Width = face.FaceRectangle.Width * p;
30.             rect.Height = face.FaceRectangle.Height * p;
31.             Canvas.SetLeft(rect, face.FaceRectangle.Left * p + offset_w);
32.             Canvas.SetTop(rect, face.FaceRectangle.Top * p + offset_h);
33.             rect.Stroke = new  SolidColorBrush(Colors.Orange);
34.             rect.StrokeThickness = 3;
35. 
36.             cvasMain.Children.Add(rect);
37. 
38.             TextBlock txt = new  TextBlock();
39.             txt.Foreground = new  SolidColorBrush(Colors.Orange);
40.             txt.Text = "#" + count;
41.             Canvas.SetLeft(txt, face.FaceRectangle.Left * p + offset_w);
42.             Canvas.SetTop(txt, face.FaceRectangle.Top * p + offset_h - 20);
43.             cvasMain.Children.Add(txt);
44.             count++;
45.         }
46.     }          
47. }

Once it receives the response from the Azure, every recognized face on the screen will be overlapped with an orange rectangle, which is shown in Figure 4.

Fig. 4 The result of Face API

Summary

In this tutorial, we showed the steps to connect your drones with Azure Cognitive Services. You can take a screenshot and send it to the Azure Face APIs to get the face recognition results, which will be displayed on the UI.

Resources

1. Azure Cloud Services for DJI Drones: https://www.hackster.io/JiongShi/azure-cloud-services-for-dji-drones-d8d3a3
2. Source Code: https://github.com/shijiong/AzureCloudService4DJI
3. DJI Windows SDK: https://developer.dji.com/windows-sdk/

Share via