Scene analysis for MediaCapture

2025-01-16

This article describes how to use the SceneAnalysisEffect and the FaceDetectionEffect to analyze the content of the media capture preview stream.

Scene analysis effect

The SceneAnalysisEffect analyzes the video frames in the media capture preview stream and recommends processing options to improve the capture result. Currently, the effect supports detecting whether the capture would be improved by using High Dynamic Range (HDR) processing.

If the effect recommends using HDR, you can do this in the following ways:

Use the AdvancedPhotoCapture class to capture photos using the Windows built-in HDR processing algorithm. For more information, see High dynamic range (HDR) and low-light photo capture.
Use the HdrVideoControl to capture video using the Windows built-in HDR processing algorithm. For more information, see Capture device controls for video capture.
Use the VariablePhotoSequenceControl to capture a sequence of frames that you can then composite using a custom HDR implementation. For more information, see Variable photo sequence.

Initialize the scene analysis effect and add it to the preview stream

Video effects are implemented using two APIs, an effect definition, which provides settings that the capture device needs to initialize the effect, and an effect instance, which can be used to control the effect. Since you may want to access the effect instance from multiple places within your code, you should typically declare a member variable to hold the object.

private SceneAnalysisEffect m_sceneAnalysisEffect;

In your app, after you have initialized the MediaCapture object, create a new instance of SceneAnalysisEffectDefinition.

Register the effect with the capture device by calling AddVideoEffectAsync on your MediaCapture object, providing the SceneAnalysisEffectDefinition and specifying MediaStreamType.VideoPreview to indicate that the effect should be applied to the video preview stream, as opposed to the capture stream. AddVideoEffectAsync returns an instance of the added effect. Because this method can be used with multiple effect types, you must cast the returned instance to a SceneAnalysisEffect object.

To receive the results of the scene analysis, you must register a handler for the SceneAnalyzed event.

Currently, the scene analysis effect only includes the high dynamic range analyzer. Enable HDR analysis by setting the effect's HighDynamicRangeControl.Enabled to true.

using Microsoft.UI.Xaml.Controls;
using Microsoft.UI.Xaml;
using System;
using Windows.Media.Devices;
using System.Linq;
using Microsoft.UI.Xaml.Input;
using System.Threading.Tasks;
using Windows.Foundation;
using Windows.Media.MediaProperties;
using Windows.Graphics.Display;
using Windows.Media.Capture;
using System.Collections.Generic;
using Windows.Media.Capture.Frames;
using Windows.Media.Core;
using Windows.Media.Effects;
using Windows.Media;
using Windows.UI.Core;

//using MyVideoEffect;
using Windows.Graphics.Imaging;

namespace CameraWinUI
{
    public sealed partial class MainWindow : Window
    {

        #region Basic add/remove

        IVideoEffectDefinition myEffectDefinition;
        IMediaExtension myPreviewEffect;
        IMediaExtension myRecordEffect;

        private async void bBasicAddEffect_Click(object sender, RoutedEventArgs e)
        {
            

            myEffectDefinition = new VideoEffectDefinition("MyVideoEffect.ExampleVideoEffect");

            // <SnippetBasicAddEffect>
            if (m_mediaCapture.MediaCaptureSettings.VideoDeviceCharacteristic == VideoDeviceCharacteristic.AllStreamsIdentical ||
                m_mediaCapture.MediaCaptureSettings.VideoDeviceCharacteristic == VideoDeviceCharacteristic.PreviewRecordStreamsIdentical)
            {
                // This effect will modify both the preview and the record streams, because they are the same stream.
                myRecordEffect = await m_mediaCapture.AddVideoEffectAsync(myEffectDefinition, MediaStreamType.VideoRecord);
            }
            else
            {
                myRecordEffect = await m_mediaCapture.AddVideoEffectAsync(myEffectDefinition, MediaStreamType.VideoRecord);
                myPreviewEffect = await m_mediaCapture.AddVideoEffectAsync(myEffectDefinition, MediaStreamType.VideoPreview);
            }
            // </SnippetBasicAddEffect>
        }
        public async void RemoveOneEffect()
        {
            // <SnippetRemoveOneEffect>
            if (myRecordEffect != null)
            {
                await m_mediaCapture.RemoveEffectAsync(myRecordEffect);
            }
            if (myPreviewEffect != null)
            {
                await m_mediaCapture.RemoveEffectAsync(myPreviewEffect);
            }
            // </SnippetRemoveOneEffect>
        }
        public async void RemoveAllEffects()
        {
            // <SnippetClearAllEffects>
            await m_mediaCapture.ClearEffectsAsync(MediaStreamType.VideoPreview);
            await m_mediaCapture.ClearEffectsAsync(MediaStreamType.VideoRecord);
            // </SnippetClearAllEffects>
        }

        #endregion

        #region Video stabilization effect

        

        // <SnippetDeclareVideoStabilizationEffect>
        // 
        private VideoStabilizationEffect m_videoStabilizationEffect;
        private VideoEncodingProperties m_inputPropertiesBackup;
        private VideoEncodingProperties m_outputPropertiesBackup;
        private MediaEncodingProfile m_encodingProfile;
        // </SnippetDeclareVideoStabilizationEffect>


        private async void bSetupVideoStabilizationEffect_Click(object sender, RoutedEventArgs e)
        {

            // <SnippetEncodingProfileMember>
            m_encodingProfile = MediaEncodingProfile.CreateMp4(VideoEncodingQuality.Auto);
            // </SnippetEncodingProfileMember>

            // <SnippetCreateVideoStabilizationEffect>
            // Create the effect definition
            VideoStabilizationEffectDefinition stabilizerDefinition = new VideoStabilizationEffectDefinition();

            // Add the video stabilization effect to media capture
            m_videoStabilizationEffect =
                (VideoStabilizationEffect)await m_mediaCapture.AddVideoEffectAsync(stabilizerDefinition, MediaStreamType.VideoRecord);

            m_videoStabilizationEffect.EnabledChanged += VideoStabilizationEffect_EnabledChanged;

            await SetUpVideoStabilizationRecommendationAsync();

            m_videoStabilizationEffect.Enabled = true;
            // </SnippetCreateVideoStabilizationEffect>

            

        }
        // <SnippetSetUpVideoStabilizationRecommendationAsync>
        private async Task SetUpVideoStabilizationRecommendationAsync()
        {

            // Get the recommendation from the effect based on our current input and output configuration
            var recommendation = m_videoStabilizationEffect.GetRecommendedStreamConfiguration(m_mediaCapture.VideoDeviceController, m_encodingProfile.Video);

            // Handle the recommendation for the input into the effect, which can contain a larger resolution than currently configured, so cropping is minimized
            if (recommendation.InputProperties != null)
            {
                // Back up the current input properties from before VS was activated
                m_inputPropertiesBackup = m_mediaCapture.VideoDeviceController.GetMediaStreamProperties(MediaStreamType.VideoRecord) as VideoEncodingProperties;

                // Set the recommendation from the effect (a resolution higher than the current one to allow for cropping) on the input
                await m_mediaCapture.VideoDeviceController.SetMediaStreamPropertiesAsync(MediaStreamType.VideoRecord, recommendation.InputProperties);
                await m_mediaCapture.VideoDeviceController.SetMediaStreamPropertiesAsync(MediaStreamType.VideoPreview, recommendation.InputProperties);
            }

            // Handle the recommendations for the output from the effect
            if (recommendation.OutputProperties != null)
            {
                // Back up the current output properties from before VS was activated
                m_outputPropertiesBackup = m_encodingProfile.Video;

                // Apply the recommended encoding profile for the output
                m_encodingProfile.Video = recommendation.OutputProperties;
            }
        }
        // </SnippetSetUpVideoStabilizationRecommendationAsync>
        // <SnippetVideoStabilizationEnabledChanged>
        private async void VideoStabilizationEffect_EnabledChanged(VideoStabilizationEffect sender, VideoStabilizationEffectEnabledChangedEventArgs args)
        {
            await Dispatcher.RunAsync(CoreDispatcherPriority.Normal, () =>
            {
                // Update your UI to reflect the change in status
                tbStatus.Text = "video stabilization status: " + sender.Enabled + ". Reason: " + args.Reason;
            });
        }
        // </SnippetVideoStabilizationEnabledChanged>
        private async void bCleanupVideoStabilizationEffect_Click(object sender, RoutedEventArgs e)
        {
            // <SnippetCleanUpVisualStabilizationEffect>
            // Clear all effects in the pipeline
            await m_mediaCapture.RemoveEffectAsync(m_videoStabilizationEffect);

            // If backed up settings (stream properties and encoding profile) exist, restore them and clear the backups
            if (m_inputPropertiesBackup != null)
            {
                await m_mediaCapture.VideoDeviceController.SetMediaStreamPropertiesAsync(MediaStreamType.VideoRecord, m_inputPropertiesBackup);
                m_inputPropertiesBackup = null;
            }

            if (m_outputPropertiesBackup != null)
            {
                m_encodingProfile.Video = m_outputPropertiesBackup;
                m_outputPropertiesBackup = null;
            }

            m_videoStabilizationEffect.EnabledChanged -= VideoStabilizationEffect_EnabledChanged;

            m_videoStabilizationEffect = null;
            // </SnippetCleanUpVisualStabilizationEffect>
        }

        #endregion Video stabilization effect

        #region scene analyis effect
        // <SnippetDeclareSceneAnalysisEffect>
        private SceneAnalysisEffect m_sceneAnalysisEffect;
        // </SnippetDeclareSceneAnalysisEffect>

        private async void bCreateSceneAnalysisEffect_Click(object sender, RoutedEventArgs e)
        {
            // <SnippetCreateSceneAnalysisEffectAsync>
            // Create the definition
            var definition = new SceneAnalysisEffectDefinition();

            // Add the effect to the video record stream
            m_sceneAnalysisEffect = (SceneAnalysisEffect)await m_mediaCapture.AddVideoEffectAsync(definition, MediaStreamType.VideoPreview);

            // Subscribe to notifications about scene information
            m_sceneAnalysisEffect.SceneAnalyzed += SceneAnalysisEffect_SceneAnalyzed;

            // Enable HDR analysis
            m_sceneAnalysisEffect.HighDynamicRangeAnalyzer.Enabled = true;
            // </SnippetCreateSceneAnalysisEffectAsync>

        }

        double MyCertaintyCap = .5;
        // <SnippetSceneAnalyzed>
        private void SceneAnalysisEffect_SceneAnalyzed(SceneAnalysisEffect sender, SceneAnalyzedEventArgs args)
        {
            double hdrCertainty = args.ResultFrame.HighDynamicRange.Certainty;

            // Certainty value is between 0.0 and 1.0
            if (hdrCertainty > MyCertaintyCap)
            {
                DispatcherQueue.TryEnqueue(() =>
                {
                    tbStatus.Text = "Enabling HDR capture is recommended.";
                });
            }
        }
        // </SnippetSceneAnalyzed>

        private async void bCleanupSceneAnalysisEffect_Click(object sender, RoutedEventArgs e)
        {
            // <SnippetCleanUpSceneAnalysisEffectAsync>
            // Disable detection
            m_sceneAnalysisEffect.HighDynamicRangeAnalyzer.Enabled = false;

            m_sceneAnalysisEffect.SceneAnalyzed -= SceneAnalysisEffect_SceneAnalyzed;

            // Remove the effect from the preview stream
            await m_mediaCapture.ClearEffectsAsync(MediaStreamType.VideoPreview);

            // Clear the member variable that held the effect instance
            m_sceneAnalysisEffect = null;
            // </SnippetCleanUpSceneAnalysisEffectAsync>
        }

        #endregion scene analyis effect


        #region Face detection

        // <SnippetDeclareFaceDetectionEffect>
        FaceDetectionEffect m_faceDetectionEffect;
        // </SnippetDeclareFaceDetectionEffect>





        private async void bCreateFaceDetectionEffect_Click(object sender, RoutedEventArgs e)
        {
            // <SnippetCreateFaceDetectionEffectAsync>

            // Create the definition, which will contain some initialization settings
            var definition = new FaceDetectionEffectDefinition();

            // To ensure preview smoothness, do not delay incoming samples
            definition.SynchronousDetectionEnabled = false;

            // In this scenario, choose detection speed over accuracy
            definition.DetectionMode = FaceDetectionMode.HighPerformance;

            // Add the effect to the preview stream
            m_faceDetectionEffect = (FaceDetectionEffect)await m_mediaCapture.AddVideoEffectAsync(definition, MediaStreamType.VideoPreview);

            // Choose the shortest interval between detection events
            m_faceDetectionEffect.DesiredDetectionInterval = TimeSpan.FromMilliseconds(33);

            // Start detecting faces
            m_faceDetectionEffect.Enabled = true;

            // </SnippetCreateFaceDetectionEffectAsync>


            // <SnippetRegisterFaceDetectionHandler>
            // Register for face detection events
            m_faceDetectionEffect.FaceDetected += FaceDetectionEffect_FaceDetected;
            // </SnippetRegisterFaceDetectionHandler>


            // <SnippetAreFaceFocusAndExposureSupported>
            var regionsControl = m_mediaCapture.VideoDeviceController.RegionsOfInterestControl;
            bool faceDetectionFocusAndExposureSupported =
                regionsControl.MaxRegions > 0 &&
                (regionsControl.AutoExposureSupported || regionsControl.AutoFocusSupported);
            // </SnippetAreFaceFocusAndExposureSupported>
        }

        private async void bCleanipFaceDetectionEffect_Click(object sender, RoutedEventArgs e)
        {
            // <SnippetCleanUpFaceDetectionEffectAsync>
            // Disable detection
            m_faceDetectionEffect.Enabled = false;

            // Unregister the event handler
            m_faceDetectionEffect.FaceDetected -= FaceDetectionEffect_FaceDetected;

            // Remove the effect from the preview stream
            await m_mediaCapture.ClearEffectsAsync(MediaStreamType.VideoPreview);

            // Clear the member variable that held the effect instance
            m_faceDetectionEffect = null;
            // </SnippetCleanUpFaceDetectionEffectAsync>
        }



        // <SnippetFaceDetected>
        private void FaceDetectionEffect_FaceDetected(FaceDetectionEffect sender, FaceDetectedEventArgs args)
        {
            foreach (Windows.Media.FaceAnalysis.DetectedFace face in args.ResultFrame.DetectedFaces)
            {
                BitmapBounds faceRect = face.FaceBox;

                // Draw a rectangle on the preview stream for each face
            }
        }
        // </SnippetFaceDetected>

        #endregion Face detection


    }
}

Implement the SceneAnalyzed event handler

The results of the scene analysis are returned in the SceneAnalyzed event handler. The SceneAnalyzedEventArgs object passed into the handler has a SceneAnalysisEffectFrame object which has a HighDynamicRangeOutput object. The Certainty property of the high dynamic range output provides a value between 0 and 1.0 where 0 indicates that HDR processing would not help improve the capture result and 1.0 indicates that HDR processing would help. You can decide the threshold point at which you want to use HDR or show the results to the user and let the user decide.

private void SceneAnalysisEffect_SceneAnalyzed(SceneAnalysisEffect sender, SceneAnalyzedEventArgs args)
{
    double hdrCertainty = args.ResultFrame.HighDynamicRange.Certainty;

    // Certainty value is between 0.0 and 1.0
    if (hdrCertainty > MyCertaintyCap)
    {
        DispatcherQueue.TryEnqueue(() =>
        {
            tbStatus.Text = "Enabling HDR capture is recommended.";
        });
    }
}

The HighDynamicRangeOutput object passed into the handler also has a FrameControllers property which contains suggested frame controllers for capturing a variable photo sequence for HDR processing. For more information, see Variable photo sequence.

Clean up the scene analysis effect

When your app is done capturing, before disposing of the MediaCapture object, you should disable the scene analysis effect by setting the effect's HighDynamicRangeAnalyzer.Enabled property to false and unregister your SceneAnalyzed event handler. Call MediaCapture.ClearEffectsAsync, specifying the video preview stream since that was the stream to which the effect was added. Finally, set your member variable to null.

// Disable detection
m_sceneAnalysisEffect.HighDynamicRangeAnalyzer.Enabled = false;

m_sceneAnalysisEffect.SceneAnalyzed -= SceneAnalysisEffect_SceneAnalyzed;

// Remove the effect from the preview stream
await m_mediaCapture.ClearEffectsAsync(MediaStreamType.VideoPreview);

// Clear the member variable that held the effect instance
m_sceneAnalysisEffect = null;

Face detection effect

The FaceDetectionEffect identifies the location of faces within the media capture preview stream. The effect allows you to receive a notification whenever a face is detected in the preview stream and provides the bounding box for each detected face within the preview frame. On supported devices, the face detection effect also provides enhanced exposure and focus on the most important face in the scene.

Initialize the face detection effect and add it to the preview stream

FaceDetectionEffect m_faceDetectionEffect;

In your app, after you have initialized the MediaCapture object, create a new instance of FaceDetectionEffectDefinition. Set the DetectionMode property to prioritize faster face detection or more accurate face detection. Set SynchronousDetectionEnabled to specify that incoming frames are not delayed waiting for face detection to complete as this can result in a choppy preview experience.

Register the effect with the capture device by calling AddVideoEffectAsync on your MediaCapture object, providing the FaceDetectionEffectDefinition and specifying MediaStreamType.VideoPreview to indicate that the effect should be applied to the video preview stream, as opposed to the capture stream. AddVideoEffectAsync returns an instance of the added effect. Because this method can be used with multiple effect types, you must cast the returned instance to a FaceDetectionEffect object.

Enable or disable the effect by setting the FaceDetectionEffect.Enabled property. Adjust how often the effect analyzes frames by setting the FaceDetectionEffect.DesiredDetectionInterval property. Both of these properties can be adjusted while media capture is ongoing.


// Create the definition, which will contain some initialization settings
var definition = new FaceDetectionEffectDefinition();

// To ensure preview smoothness, do not delay incoming samples
definition.SynchronousDetectionEnabled = false;

// In this scenario, choose detection speed over accuracy
definition.DetectionMode = FaceDetectionMode.HighPerformance;

// Add the effect to the preview stream
m_faceDetectionEffect = (FaceDetectionEffect)await m_mediaCapture.AddVideoEffectAsync(definition, MediaStreamType.VideoPreview);

// Choose the shortest interval between detection events
m_faceDetectionEffect.DesiredDetectionInterval = TimeSpan.FromMilliseconds(33);

// Start detecting faces
m_faceDetectionEffect.Enabled = true;

Receive notifications when faces are detected

If you want to perform some action when faces are detected, such as drawing a box around detected faces in the video preview, you can register for the FaceDetected event.

// Register for face detection events
m_faceDetectionEffect.FaceDetected += FaceDetectionEffect_FaceDetected;

In the handler for the event, you can get a list of all faces detected in a frame by accessing the FaceDetectionEffectFrame.DetectedFaces property of the FaceDetectedEventArgs. The FaceBox property is a BitmapBounds structure that describes the rectangle containing the detected face in units relative to the preview stream dimensions. To view sample code that transforms the preview stream coordinates into screen coordinates, see the face detection UWP sample.

private void FaceDetectionEffect_FaceDetected(FaceDetectionEffect sender, FaceDetectedEventArgs args)
{
    foreach (Windows.Media.FaceAnalysis.DetectedFace face in args.ResultFrame.DetectedFaces)
    {
        BitmapBounds faceRect = face.FaceBox;

        // Draw a rectangle on the preview stream for each face
    }
}

Clean up the face detection effect

When your app is done capturing, before disposing of the MediaCapture object, you should disable the face detection effect with FaceDetectionEffect.Enabled and unregister your FaceDetected event handler if you previously registered one. Call MediaCapture.ClearEffectsAsync, specifying the video preview stream since that was the stream to which the effect was added. Finally, set your member variable to null.

// Disable detection
m_faceDetectionEffect.Enabled = false;

// Unregister the event handler
m_faceDetectionEffect.FaceDetected -= FaceDetectionEffect_FaceDetected;

// Remove the effect from the preview stream
await m_mediaCapture.ClearEffectsAsync(MediaStreamType.VideoPreview);

// Clear the member variable that held the effect instance
m_faceDetectionEffect = null;

Check for focus and exposure support for detected faces

Not all devices have a capture device that can adjust its focus and exposure based on detected faces. Because face detection consumes device resources, you may only want to enable face detection on devices that can use the feature to enhance capture. To see if face-based capture optimization is available, get the VideoDeviceController for your initialized MediaCapture and then get the video device controller's RegionsOfInterestControl. Check to see if the MaxRegions supports at least one region. Then check to see if either AutoExposureSupported or AutoFocusSupported are true. If these conditions are met, then the device can take advantage of face detection to enhance capture.

var regionsControl = m_mediaCapture.VideoDeviceController.RegionsOfInterestControl;
bool faceDetectionFocusAndExposureSupported =
    regionsControl.MaxRegions > 0 &&
    (regionsControl.AutoExposureSupported || regionsControl.AutoFocusSupported);