Embedded content safety (preview)

2025-03-26

Embedded content safety is designed for on-device scenarios where cloud connectivity is intermittent or the user prefers on-device access for privacy reasons.

You can use embedded content safety locally on a PC to detect harmful content generated by a large language model, or in a car that might travel out of a specified range. You can also develop hybrid cloud and offline solutions. For scenarios where your devices must be in a secure environment like a bank or government entity, you should first consider disconnected containers.

Important

Microsoft limits access to embedded content safety. You can apply for access through the Azure AI content safety embedded content safety limited access review. Instructions are provided upon successful completion of the limited access review process. For more information, see Limited access.

Platform requirements

Embedded content safety is included with the Azure AI Content Safety C++ SDK.

Choose your target environment

Embedded content safety only supports Windows. Contact your Microsoft account administrator if you need to run embedded content safety on a different platform.

Requires Windows 10 or newer on x64 hardware.

The latest Microsoft Visual C++ Redistributable for Visual Studio 2015-2022 must be installed regardless of the programming language used with the content safety SDK.

Limitations

Embedded content safety is only available with the C++ SDK. The other Content Safety SDKs and REST APIs don't support embedded content safety.

Embedded content safety SDK packages

For C++ embedded applications, install the following C++ packages:

Package	Description
Azure.AI.ContentSafety.Extension.Embedded.Text	Required to run text analysis on device
Azure.AI.ContentSafety.Extension.Embedded.Image	Required to run image analysis on device

Models

For embedded content safety, you need to download the content safety to your device.

The embedded content safety supports Analyze text and Analyze image features. These features scan text or image content for sexual content, violence, hate, and self-harm with multiple severity levels.

These embedded models have been optimized for on-device execution with less computational resources compared to the Azure API. Therefore, it's possible that the output generated from the embedded content safety model may vary from that of the Azure API.

Code samples

Below is the ready-to-use embedded content safety sample. Follow the readme file to run the sample.

C++ sample

Performance evaluations

Embedded content safety models run fully on your target devices. Understanding the performance characteristics of these models on your devices' hardware can be critical to delivering low latency experiences within your products and applications. This section provides information to help determine if your device is suitable to run embedded content safety for text analysis or image analysis.

Factors that affect performance

Device specifications – The specifications of your device play a key role in whether embedded content safety models can run without performance issues. CPU clock speed, architecture (for example, x64, ARM processor, etcetera), and memory can all affect model inference speed.

CPU/GPU load – In most cases, your device is running other applications in parallel to the application where embedded content safety models are integrated. The amount of CPU/GPU load your device experiences when idle and at peak can also affect performance.

For example, if the device is under moderate to high CPU load from all other applications running on the device, it's possible to encounter performance issues for running embedded content safety in addition to the other applications, even with a powerful processor.

Memory load – An embedded content safety text analysis process consumes about 900 MB of memory at runtime. If your device has less memory available for the embedded content safety process to use, frequent fallbacks to virtual memory and paging can introduce more latencies. This can affect both the real-time factor and user-perceived latency.

SDK parameters that can affect performance

The following SDK parameters can impact the inference time of the embedded content safety model.

gpuEnabled Set as true to enable GPU, otherwise CPU is used. Generally inference time is shorter on GPU.
numThreads This parameter only works for CPU. It defines number of threads to be used in a multi-threaded environment. We support a maximum number of four threads.

Performance benchmark data on popular CPUs and GPUs

As stated above, there are multiple factors that impact the performance of an embedded content safety model. We highly recommend you test it on your device and tweak the parameters to fit for your application's requirements.

We also conduct performance benchmark tests on various popular PC CPUs and GPUs. Keep in mind that even with the same CPU, performance can vary depending on the CPU and memory load. The benchmark data provided should serve as a reference when considering if the embedded content safety can operate on your device. For optimal results, we advise testing on your intended device and in your specific application scenario.

The sample code includes code snippets to monitor performance metrics like memory, inference time.

Text analysis performance
Image analysis performance

CPU: Intel(R) Core(TM) Ultra i9 149000HX

AACS_NUM_THREADS	CPU Memory Utilization	Latency
1	940 MB	260 ms
2	940 MB	164 ms
3	940 MB	125 ms
4	940 MB	103 ms

CPU: Intel(R) Core(TM) Ultra 5 125H

AACS_NUM_THREADS	CPU Memory Utilization	Latency
1	940 MB	320 ms
2	940 MB	180 ms
3	940 MB	125 ms
4	940 MB	105 ms

GPU: NVIDIA GeForce RTX 4060 Laptop GPU

GPU Memory Utilization	Latency
940 MB	26 ms

GPU: Intel(R) Arc(TM) Graphics

CPU Memory Utilization	Latency
940 MB	80 ms