How to configure content filters with Azure OpenAI Service

Note

All customers have the ability to modify the content filters and configure the severity thresholds (low, medium, high). Approval is required for turning the content filters partially or fully off. Managed customers only may apply for full content filtering control via this form: Azure OpenAI Limited Access Review: Modified Content Filters.

The content filtering system integrated into Azure OpenAI Service runs alongside the core models and uses an ensemble of multi-class classification models to detect four categories of harmful content (violence, hate, sexual, and self-harm) at four severity levels respectively (safe, low, medium, and high), and optional binary classifiers for detecting jailbreak risk, existing text, and code in public repositories. The default content filtering configuration is set to filter at the medium severity threshold for all four content harms categories for both prompts and completions. That means that content that is detected at severity level medium or high is filtered, while content detected at severity level low or safe is not filtered by the content filters. Learn more about content categories, severity levels, and the behavior of the content filtering system here. Jailbreak risk detection and protected text and code models are optional and off by default. For jailbreak and protected material text and code models, the configurability feature allows all customers to turn the models on and off. The models are by default off and can be turned on per your scenario. Some models are required to be on for certain scenarios to retain coverage under the Customer Copyright Commitment.

Content filters can be configured at resource level. Once a new configuration is created, it can be associated with one or more deployments. For more information about model deployment, see the resource deployment guide.

The configurability feature is available in preview and allows customers to adjust the settings, separately for prompts and completions, to filter content for each content category at different severity levels as described in the table below. Content detected at the 'safe' severity level is labeled in annotations but is not subject to filtering and isn't configurable.

Severity filtered Configurable for prompts Configurable for completions Descriptions
Low, medium, high Yes Yes Strictest filtering configuration. Content detected at severity levels low, medium, and high is filtered.
Medium, high Yes Yes Default setting. Content detected at severity level low isn't filtered, content at medium and high is filtered.
High Yes Yes Content detected at severity levels low and medium isn't filtered. Only content at severity level high is filtered.
No filters If approved* If approved* No content is filtered regardless of severity level detected. Requires approval*.

* Only approved customers have full content filtering control and can turn the content filters partially or fully off. Managed customers only can apply for full content filtering control via this form: Azure OpenAI Limited Access Review: Modified Content Filters

Customers are responsible for ensuring that applications integrating Azure OpenAI comply with the Code of Conduct.

Filter category Default setting Applied to prompt or completion? Description
Jailbreak risk detection Off Prompt Can be turned on to filter or annotate user prompts that might present a Jailbreak Risk. For more information about consuming annotations, visit Azure OpenAI Service content filtering
Protected material - code off Completion Can be turned on to get the example citation and license information in annotations for code snippets that match any public code sources. For more information about consuming annotations, see the content filtering concepts guide
Protected material - text off Completion Can be turned on to identify and block known text content from being displayed in the model output (for example, song lyrics, recipes, and selected web content).

Configuring content filters via Azure OpenAI Studio (preview)

The following steps show how to set up a customized content filtering configuration for your resource.

  1. Go to Azure OpenAI Studio and navigate to the Content Filters tab (in the bottom left navigation, as designated by the red box below).

    Screenshot of the AI Studio UI with Content Filters highlighted

  2. Create a new customized content filtering configuration.

    Screenshot of the content filtering configuration UI with create selected

    This leads to the following configuration view, where you can choose a name for the custom content filtering configuration.

    Screenshot of the content filtering configuration UI

  3. This is the view of the default content filtering configuration, where content is filtered at medium and high severity levels for all categories. You can modify the content filtering severity level for both user prompts and model completions separately (configuration for prompts is in the left column and configuration for completions is in the right column, as designated with the blue boxes below) for each of the four content categories (content categories are listed on the left side of the screen, as designated with the green box below). There are three severity levels for each category that are configurable: Low, medium, and high. You can use the slider to set the severity threshold.

    Screenshot of the content filtering configuration UI with user prompts and model completions highlighted

  4. If you determine that your application or usage scenario requires stricter filtering for some or all content categories, you can configure the settings, separately for prompts and completions, to filter at more severity levels than the default setting. An example is shown in the image below, where the filtering level for user prompts is set to the strictest configuration for hate and sexual, with low severity content filtered along with content classified as medium and high severity (outlined in the red box below). In the example, the filtering levels for model completions are set at the strictest configuration for all content categories (blue box below). With this modified filtering configuration in place, low, medium, and high severity content will be filtered for the hate and sexual categories in user prompts; medium and high severity content will be filtered for the self-harm and violence categories in user prompts; and low, medium, and high severity content will be filtered for all content categories in model completions.

    Screenshot of the content filtering configuration with low, medium, high, highlighted.

  5. If your use case was approved for modified content filters as outlined above, you receive full control over content filtering configurations and can choose to turn filtering partially or fully off. In the image below, filtering is turned off for violence (green box below), while default configurations are retained for other categories. While this disabled the filter functionality for violence, content will still be annotated. To turn off all filters and annotations, toggle off Filters and annotations (red box below).

    Screenshot of the content filtering configuration with self harm and violence set to off.

    You can create multiple content filtering configurations as per your requirements.

  6. To turn on the optional models, you can select any of the checkboxes at the left hand side. When each of the optional models is turned on, you can indicate whether the model should Annotate or Filter.

  7. Selecting Annotate runs the respective model and return annotations via API response, but it will not filter content. In addition to annotations, you can also choose to filter content by switching the Filter toggle to on.

  8. You can create multiple content filtering configurations as per your requirements.

    Screenshot of multiple content configurations in the Azure portal.

  9. Next, to make a custom content filtering configuration operational, assign a configuration to one or more deployments in your resource. To do this, go to the Deployments tab and select Edit deployment (outlined near the top of the screen in a red box below).

    Screenshot of the content filtering configuration with edit deployment highlighted.

  10. Go to advanced options (outlined in the blue box below) select the content filter configuration suitable for that deployment from the Content Filter dropdown (outlined near the bottom of the dialog box in the red box below).

    Screenshot of edit deployment configuration with advanced options selected.

  11. Select Save and close to apply the selected configuration to the deployment.

    Screenshot of edit deployment configuration with content filter selected.

  12. You can also edit and delete a content filter configuration if required. To do this, navigate to the content filters tab and select the desired action (options outlined near the top of the screen in the red box below). You can edit/delete only one filtering configuration at a time.

    Screenshot of content filter configuration with edit and delete highlighted.

    Note

    Before deleting a content filtering configuration, you will need to unassign it from any deployment in the Deployments tab.

Best practices

We recommend informing your content filtering configuration decisions through an iterative identification (for example, red team testing, stress-testing, and analysis) and measurement process to address the potential harms that are relevant for a specific model, application, and deployment scenario. After you implement mitigations such as content filtering, repeat measurement to test effectiveness. Recommendations and best practices for Responsible AI for Azure OpenAI, grounded in the Microsoft Responsible AI Standard can be found in the Responsible AI Overview for Azure OpenAI.

Next steps