Mysteries of Kinect for Windows Face Tracking output explained

Since the release of Kinect for Windows version 1.5, developers have been able to use the Face Tracking software development kit (SDK) to create applications that can track human faces in real time. Figure 1, an illustration from the Face Tracking documentation, displays 87 of the points used to track the face. Thirteen points are not illustrated here—more on those points later.

Figure 1: Tracked Points
Figure 1: Tracked points

You have questions...

Based on feedback we received via comments and forum posts, it is clear there is some confusion regarding the face tracking points and the data values found when using the SDK sample code. The managed sample, FaceTrackingBasics-WPF, demonstrates how to visualize mesh data by displaying a 3D model representation on top of the color camera image.

MeshModel - Copy
Figure 2: Screenshot from FaceTrackingBasics-WPF

By exploring this sample source code, you will find a set of helper functions under the Microsoft.Kinect.Toolkit.FaceTracking project, in particular GetProjected3DShape(). What many have found was the function returned a collection where the length of the array was 121 values. Additionally, some have also found an enum list, called “FeaturePoint”, that includes 70 items.

We have answers...

As you can see, we have two main sets of numbers that don't seem to add up. This is because these are two sets of values that are provided by the SDK:

  1. 3D Shape Points (mesh representation of the face): 121
  2. Tracked Points: 87 + 13

The 3D Shape Points (121 of them) are the mesh vertices that make a 3D face model based on the Candide-3 wireframe.

Figure 3: image from https://www.icg.isy.liu.se/candide/img/candide3_rot128.gif
Figure 3: Wireframe of the Candide-3 model https://www.icg.isy.liu.se/candide/img/candide3_rot128.gif

These vertices are morphed by the FaceTracking APIs to align with the face. The GetProjected3DShape method returns the vertices as an array of  Vector3DF[]. These values can be enumerated by name using the "FeaturePoint" list. For example, TopSkull, LeftCornerMouth, or OuterTopRightPupil. Figure 4 shows these values superimposed on top of the color frame. 

FeaturePoints
Figure 4: Feature Point index mapped on mesh model

To get the 100 tracked points mentioned above, we need to dive more deeply into the APIs. The managed APIs, provide an FtInterop.cs file that defines an interface, IFTResult, which contains a Get2DShapePoints function. FtInterop is a wrapper for the native library that exposes its functionality to managed languages. Users of the unmanaged C++ API may have already seen this and figured it out. Get2DShapePoints is the function that will provide the 100 tracked points.

If we have a look at the function, it doesn’t seem to be useful to a managed code developer:

// STDMETHOD(Get2DShapePoints)(THIS_ FT_VECTOR2D** ppPoints, UINT* pPointCount) PURE;void Get2DShapePoints(out IntPtr pointsPtr, out uint pointCount);

To get a better idea of how you can get a collection of points from IntPtr, we need to dive into the unmanaged function:

/// <summary> /// Returns 2D (X,Y) coordinates of the key points on the aligned face in video frame coordinates. /// </summary> /// <param name="ppPoints">Array of 2D points (as FT_VECTOR2D).</param> /// <param name="pPointCount">Number of elements in ppPoints.</param> /// <returns>If the method succeeds, the return value is S_OK. If the method fails, the return value can be E_POINTER.</returns> STDMETHOD(Get2DShapePoints)(THIS_ FT_VECTOR2D** ppPoints, UINT* pPointCount) PURE; 

The function will give us a pointer to the FT_VECTOR2D array. To consume the data from the pointer, we have to create a new function for use with managed code.

The managed code

First, you need to create an array to contain the data that is copied to managed memory. Since FT_VECTOR2D is an unmanaged structure, to marshal the data to the managed wrapper, we must have an equivalent data type to match. The managed version of this structure is PointF (structure that uses floats for x and y).

Now that we have a data type, we need to convert IntPtr to PointF[]. Searching the code, we see that the FaceTrackFrame class wraps the IFTResult object. This also contains the GetProjected3DShape() function we used before, so this is a good candidate to add a new function, GetShapePoints. It will look something like this:

// populates an array for the ShapePoints public void GetShapePoints(ref Vector2DF[] vector2DF) {     // get the 2D tracked shapes     IntPtr ptBuffer = IntPtr.Zero;     uint ptCount = 0;     this.faceTrackingResultPtr.Get2DShapePoints(out ptBuffer, out ptCount);     if (ptCount == 0)     {         ``vector2DF = null;         return;     }      // create a managed array to hold the values     if (vector2DF == null || (vector2DF != null && vector2DF.Length != ptCount))     {         vector2DF = new Vector2DF[ptCount];     }      ulong sizeInBytes = (ulong)Marshal.SizeOf(typeof(Vector2DF));     for (ulong i = 0; i < ptCount; i++)     {         vector2DF[i] = (Vector2DF)Marshal.PtrToStructure((IntPtr)((ulong)ptBuffer + (i * sizeInBytes)), typeof(Vector2DF));     } } 

To ensure we are using the data correctly, we refer to the documentation on Get2DShapePoints:

IFTResult::Get2DShapePoints Method gets the (x,y) coordinates of the key points on the aligned face in video frame coordinates.

The PointF values represent the mapped values on the color image. Since we know it matches the color frame, there is no need to do apply mapping. You can call the function to get the data, which should align to the color image coordinates.

The sample code

The modified version of FaceTrackingBasics-WPF is available in the sample code that can be downloaded from CodePlex. It has been modified to allow you to display the feature points (by name or by index value) and toggle the mesh drawing. Because of the way WPF renders, the performance can suffer on machines with lower end graphics cards. I recommend that you only enable these one at a time. If your UI becomes unresponsive, you can block the sensor with your hand to prevent FaceTracking data capturing. Since the application will not detect any face tracked data, it will not render any points, giving you the opportunity to reset the features you enabled by using the UI controls.

ShapePoints
Figure 5: ShapePoints mapped around the face

As you can see in Figure 5, the additional 13 points are the center of the eyes, the tip of the nose, and the areas above the eyebrows on the forehead. Once you enable a feature and tracking begins, you can zoom into the center and see the values more clearly.

A summary of the changes:

MainWindows.xaml/.cs:

  • UI changes to enable slider and draw selections

 

FaceTrackingViewer.cs:

  • Added a Grid control – used for the UI elements
  • Modified the constructor to initialize grid
  • Modified the OnAllFrameReady event
    • For any tracked skeletons, create a canvas and add to the grid. Use that as the parent to put the label controls

public partial class FaceTrackingViewer : UserControl, IDisposable{     private Grid grid;      public FaceTrackingViewer()     {         this.InitializeComponent();          // add grid to the layout         this.grid = new Grid();         this.grid.Background = Brushes.Transparent;         this.Content = this.grid;     }      private void OnAllFramesReady(object sender, AllFramesReadyEventArgs allFramesReadyEventArgs)     {         ...         // We want keep a record of any skeleton, tracked or untracked.         if (!this.trackedSkeletons.ContainsKey(skeleton.TrackingId))         {             // create a new canvas for each tracker              Canvas canvas = new Canvas();              canvas.Background = Brushes.Transparent;              this.grid.Children.Add( canvas );                          this.trackedSkeletons.Add(skeleton.TrackingId, new SkeletonFaceTracker(canvas));         }         ...     }}

SkeletonFaceTracker class changes:

  • New property: DrawFraceMesh, DrawShapePoints, DrawFeaturePoint, featurePoints, lastDrawFeaturePoints, shapePoints, labelControls, Canvas
  • New functions: FindTextControl UpdateTextControls, RemoveAllFromCanvas, SetShapePointsLocations, SetFeaturePointsLocations
  • Added the constructor to keep track of the parent control
  • Changed the DrawFaceModel function to draw based on what data was selected
  • Updated the OnFrameReady event to recalculate the positions based for the drawn elements
    • If DrawShapePoints is selected, then we call our new function

private class SkeletonFaceTracker : IDisposable{... // properties to toggle rendering 3D mesh, shape points and feature points public bool DrawFaceMesh { get; set; } public bool DrawShapePoints { get; set; } public DrawFeaturePoint DrawFeaturePoints { get; set; } // defined array for the feature points private Array featurePoints; private DrawFeaturePoint lastDrawFeaturePoints; // array for Points to be used in shape points rendering private PointF[] shapePoints; // map to hold the label controls for the overlay private Dictionary<string, Label> labelControls; // canvas control for new text rendering private Canvas Canvas; // canvas is passed in for every instance public SkeletonFaceTracker(Canvas canvas) { this.Canvas = canvas; } public void DrawFaceModel(DrawingContext drawingContext) { ... // only draw if selected if (this.DrawFaceMesh && this.facePoints != null) { ... } } internal void OnFrameReady(KinectSensor kinectSensor, ColorImageFormat colorImageFormat, byte[] colorImage, DepthImageFormat depthImageFormat, short[] depthImage, Skeleton skeletonOfInterest) { ... if (this.lastFaceTrackSucceeded) { ... if (this.DrawFaceMesh || this.DrawFeaturePoints != DrawFeaturePoint.None) { this.facePoints = frame.GetProjected3DShape(); } // get the shape points array if (this.DrawShapePoints) { this.shapePoints = frame.GetShapePoints(); } } // draw/remove the components SetFeaturePointsLocations(); SetShapePointsLocations(); }
    ...
}

Pulling it all together...

As we have seen, there are two types of data points that are available from the Face Tracking SDK:

  • Shape Points: data used to track the face
  • Mesh Data: vertices of the 3D model from the GetProjected3DShape() function
  • FeaturePoints: named vertices on the 3D model that play a significant role in face tracking

To get the shape point data, we have to extend the current managed wrapper with a new function that will handle the interop with the native API.

Carmine Sirignano
Developer Support Escalation Engineer
Kinect for Windows

Additional resources

Comments

  • Anonymous
    March 08, 2014
    Hi! I am hardly looking for the authors of those algorithms, can you please indicate the relevant publication(s)? Thanks a lot! Mar

  • Anonymous
    June 22, 2014
    Mar, I believe the following publication is what you are looking for: research.microsoft.com/.../CVPR10_FaceTrack.pdf Peng

  • Anonymous
    November 18, 2014
    It's probably helpful to rely on the authors of the original candide-3 model: www.icg.isy.liu.se/candide If you wish to know how they did the estimation: "Face Tracking for Model-based Coding and Face Animation", 2003, Jörgen Ahlberg , Robert Forchheimer citeseerx.ist.psu.edu/.../summary

  • Anonymous
    December 03, 2014
    Hello. I have installed version 2 of Kinect that comes with an integrated developer's toolkit. Problem is that when I load the sample face tracking project (Face Basics D2D), a majority of C++'s own headers like windows.h, string library, strsafe etc come out to be undefined (although all of Kinect's own headers and libraries appear OK). Can you tell me exactly how I am supposed to run that code? What are the steps? I would be really grateful for your help. I am really stuck at this point, any help would be appreciated. Thanks in advance.

  • Anonymous
    December 04, 2014
    Hello Samantha, Please post your question in our community forum, where engineers and developers actively monitor and respond to questions like yours. Thank you and good luck! http://aka.ms/k4wv2forum

  • Anonymous
    December 31, 2014
    In the FacTrackingViewer.cs class, I need to extract each vector in Get3DShape to an array and have each array of 121 elements mapped to memory for interprocess communication with labview. I have the following: int n = 121;                        foreach (Vector3DF[] vector in facePoints3D.GetSlices(n))                        {                             //var bytearray = vector.Select(f => Convert.ToByte(f)).ToArray();                            byte[] bytearray = new byte[vector.Length * this.facePoints3D.Count];                            //byte[] bytearray = new byte[vector.Length * sizeof(Vector3DF)];                             Buffer.BlockCopy(vector, 0, bytearray, 0, bytearray.Length);                            //Array.Copy(vector, 0, bytearray, 0, vector.Length);                        } But somehow I am not runnning into luck in my tracking code as I keep getting an argument exception error in bytearray as the debugger tells me it is not a src of primitives. Please help!

  • Anonymous
    January 01, 2015
    Hello LexiLighty, Please post your question in our community forum where you'll find many developers and engineers responding to questions. Thank you! http://aka.ms/k4wv2forum

  • Anonymous
    May 04, 2015
    In which way can I track the 3d vertices for the face you have already mentioned in case of kinect 2.0?