Edit

Share via


Content filtering annotations

Azure OpenAI in Azure AI Foundry Models provides content filtering annotations to help you understand the content filtering results for your requests. Annotations can be enabled even for filters and severity levels that have been disabled from blocking requests.

Standard content filters

When annotations are enabled as shown in the code snippets below, the following information is returned via the API for the categories hate and fairness, sexual, violence, and self-harm:

  • content filtering category (hate, sexual, violence, self_harm)
  • the severity level (safe, low, medium, or high) within each content category
  • filtering status (true or false).

Optional models

Optional models can be set to annotate mode (returns information when content is flagged, but not filtered) or filter mode (returns information when content is flagged and filtered).

When annotations are enabled as shown in the code snippets below, the following information is returned by the API for each optional model:

Model Output
User prompt attack detected (true or false),
filtered (true or false)
indirect attacks detected (true or false),
filtered (true or false)
protected material text detected (true or false),
filtered (true or false)
protected material code detected (true or false),
filtered (true or false),
Example citation of public GitHub repository where code snippet was found,
The license of the repository
Groundedness detected (true or false)
filtered (true or false)
details (completion_end_offset, completion_start_offset)

When displaying code in your application, we strongly recommend that the application also displays the example citation from the annotations. Compliance with the cited license may also be required for Customer Copyright Commitment coverage.

See the following table for the annotation mode availability in each API version:

Filter category 2024-10-01-preview 2024-02-01 GA 2024-04-01-preview 2023-10-01-preview 2023-06-01-preview
Hate
Violence
Sexual
Self-harm
Prompt Shield for user prompt attacks
Prompt Shield for indirect attacks
Protected material text
Protected material code
Profanity blocklist
Custom blocklist
Groundedness1

1 Not available in non-streaming scenarios; only available for streaming scenarios. The following regions support Groundedness Detection: Central US, East US, France Central, and Canada East

Code examples

The following code snippets show how to view content filter annotations in different programming languages.

# os.getenv() for the endpoint and key assumes that you are using environment variables.

import os
from openai import AzureOpenAI
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2024-03-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") 
    )

response = client.completions.create(
    model="gpt-35-turbo-instruct", # model = "deployment_name".
    prompt="{Example prompt where a severity level of low is detected}" 
    # Content that is detected at severity level medium or high is filtered, 
    # while content detected at severity level low isn't filtered by the content filters.
)

print(response.model_dump_json(indent=2))

Output

{ 
  "choices": [ 
    { 
      "content_filter_results": { 
        "hate": { 
          "filtered": false, 
          "severity": "safe" 
        }, 
        "protected_material_code": { 
          "citation": { 
            "URL": " https://github.com/username/repository-name/path/to/file-example.txt", 
            "license": "EXAMPLE-LICENSE" 
          }, 
          "detected": true,
          "filtered": false 
        }, 
        "protected_material_text": { 
          "detected": false, 
          "filtered": false 
        }, 
        "self_harm": { 
          "filtered": false, 
          "severity": "safe" 
        }, 
        "sexual": { 
          "filtered": false, 
          "severity": "safe" 
        }, 
        "violence": { 
          "filtered": false, 
          "severity": "safe" 
        } 
      }, 
      "finish_reason": "stop", 
      "index": 0, 
      "message": { 
        "content": "Example model response will be returned ", 
        "role": "assistant" 
      } 
    } 
  ], 
  "created": 1699386280, 
  "id": "chatcmpl-8IMI4HzcmcK6I77vpOJCPt0Vcf8zJ", 
  "model": "gpt-35-turbo-instruct", 
  "object": "text.completion",
  "usage": { 
    "completion_tokens": 40, 
    "prompt_tokens": 11, 
    "total_tokens": 417 
  },  
  "prompt_filter_results": [ 
    { 
      "content_filter_results": { 
        "hate": { 
          "filtered": false, 
          "severity": "safe" 
        }, 
        "jailbreak": { 
          "detected": false, 
          "filtered": false 
        }, 
        "profanity": { 
          "detected": false, 
          "filtered": false 
        }, 
        "self_harm": { 
          "filtered": false, 
          "severity": "safe" 
        }, 
        "sexual": { 
          "filtered": false, 
          "severity": "safe" 
        }, 
        "violence": { 
          "filtered": false, 
          "severity": "safe" 
        } 
      }, 
      "prompt_index": 0 
    } 
  ]
} 

For details on the inference REST API endpoints for Azure OpenAI and how to create Chat and Completions, follow Azure OpenAI REST API reference guidance. Annotations are returned for all scenarios when using any preview API version starting from 2023-06-01-preview, as well as the GA API version 2024-02-01.

Groundedness

Annotate mode

Returns offsets referencing the ungrounded completion content.

{ 
  "ungrounded_material": { 
    "details": [ 
       { 
         "completion_end_offset": 127, 
         "completion_start_offset": 27 
       } 
   ], 
    "detected": true, 
    "filtered": false 
 } 
} 

Filter mode

Blocks completion content when ungrounded completion content was detected.

{ "ungrounded_material": { 
    "detected": true, 
    "filtered": true 
  } 
}