Updates to content filtering?

Question

Updates to content filtering?

Jack Donoher 0

Tl;dr We have a prompt that was triggering the content filter, no changes to the prompt or filter settings and the prompt is now passing through the filters.

I work on a API for multiple products to contact various models, hosted by different providers. One product was using the Azure OpenAI model, specifically GPT-4. This prompt would include a incident report and instructions on how to convert this report into JSON object. Even with the exact same incident/prompt we had an issue where ~60% of the time, this prompt would trigger the attached content filter for suspected self-harm. Note the incident reports do contain descriptions of injuries, which is what I suspect is triggering the filter.

We were working on rewriting the prompt, looking at other models etc. This issue began in July, and I left it for a couple of weeks, and having come back to look at a fix, I am consistently getting a 200 response from the model. Exact same prompt. model and content filter settings. Since the issue was raised obviously GPT-5 has launched, and I wonder if the LLM used by Azure's content filters has been updated and it better at understanding the context of the prompt?

However, I'd really like concrete confirmation of this, and I can't find any documentation around this. Ideally, I'd be able to see a timeline of changes/updates to content filtering and see if this aligns with our changing results.

Does anyone know if this documentation exists? If not, how can I track changes? My concern is that another 'stealth' change to the filter may bring the problem back.

0 comments

1 answer

Your answer

Answer 1

Hello Jack !

There isn’t a public changelog for Azure OpenAI content filter (the classifier behind suspected self-harm, sexual content....). Microsoft does update the safety stack from time to time, and those updates aren’t always individually announced.

You can check the official concepts and config docs which describes how filters work and where Microsoft usually reflects capability changes https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/content-filter

Notably, in Feb 2025 a request header was added so you can specify a content-filter configuration per request (handy for A/B tests and canaries). https://learn.microsoft.com/en-us/azure/ai-foundry/openai/whats-new

Classifiers behind the filters are service-side and can be tuned without an API version change. Microsoft does not publish a day-by-day classifier changelog, so you won’t find an exact timeline mapping to your incident only higher-level notes.

https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/content-filter

Share via

Updates to content filtering?

1 answer

Your answer