OpenAI Content Filtering Low/Medium/High vs. Moderations API

Jimstacy2 20 Reputation points
2024-05-15T18:24:01.5866667+00:00

Hi,

I've been testing using OpenAI's APIs directly, but now that I'm moving my product to production I'm going to use Azure directly. I used their moderations API which has scores for categories, but I noticed that Azure has low/medium/high instead of scores https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter?tabs=definitions,python-new. Is there a mapping of score from OpenAI's Moderations API to that low/medium/high threshold?

https://platform.openai.com/docs/guides/moderation

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,310 questions
{count} votes

Accepted answer
  1. navba-MSFT 17,900 Reputation points Microsoft Employee
    2024-05-16T03:09:27.4566667+00:00

    @Jimstacy2 Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    .

    The Azure OpenAI Service and OpenAI’s Moderation API use different systems for content filtering and moderation. See here.

    .

    OpenAI’s Moderation API provides a dictionary of per-category raw scores output by the model, denoting the model’s confidence that the input violates OpenAI’s policy for the category. The value is between 0 and 1, where higher values denote higher confidence. These scores should not be interpreted as probabilities. See here.

    .

    On the other hand, Azure OpenAI Service uses a content filtering system that works alongside core models. This system detects four categories of harmful content (violence, hate, sexual, and self-harm) at four severity levels respectively (safe, low, medium, and high). The default content filtering configuration is set to filter at the medium severity threshold for all four content harm categories for both prompts and completions. That means that content that is detected at severity level medium or high is filtered, while content detected at severity level low isn’t filtered by the content filters.

    .

    .

    Below is the sample response from Azure Open AI for the content filtering:

    data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":null,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"content_filter_offsets":{"check_offset":65,"start_offset":65,"end_offset":1056}}],"usage":null} data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":"content_filter","content_filter_results":{"protected_material_text":{"detected":true,"filtered":true}},"content_filter_offsets":{"check_offset":65,"start_offset":65,"end_offset":1056}}],"usage":null}

    More info here.

    So, there isn’t a direct mapping between the scores from OpenAI’s Moderation API and the low/medium/high thresholds in Azure’s content filtering system.

    .

    .

    However, you can still create a rough mapping based on typical score ranges and their corresponding severity levels in Azure's system.

    Here's a possible approach for creating a mapping:

    Low Confidence (0-0.3):

    • These could roughly correspond to "safe" or "low" severity in Azure's system.

    Medium Confidence (0.3-0.7):

    • These could correspond to "medium" severity in Azure's system.

    High Confidence (0.7-1):

    • These could correspond to "high" severity in Azure's system.

    .

    Keep in mind that this mapping is based on approximate score ranges. Therefore, you might need to do some testing and calibration to understand how the scores from OpenAI’s Moderation API correspond to the thresholds in Azure’s system.

    .

    .

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.


0 additional answers

Sort by: Most helpful