Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform
Hi Sh alsehri,
You can use Model Leaderboards in the Azure AI Foundry portal to view the Safety Benchmarks of language models. Microsoft uses the following evaluations:
- HarmBench (standard, contextual, copyright) measures Attack Success Rate — the lower this rate, the more robust the model is against malicious prompts that try to make it generate harmful or inappropriate content.
- WMDP (Weapons of Mass Destruction Proxy) checks domain knowledge in cybersecurity, biosecurity, and chemical security, with Accuracy as its metric. A model with high scores here can understand sensitive domains — which is useful, but also needs safeguards so it doesn’t inadvertently generate harmful content.
- Toxigen assesses how well a model detects implicit toxic or hateful language using the F1 Score — higher scores mean the model is better at avoiding or flagging harmful speech.
Here is the supported documentation:
https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/model-benchmarks#safety-benchmarks-of-language-models
You can also configure content filtering to detect four categories of harmful content (violence, hate, sexual, and self-harm) at four severity levels (safe, low, medium, and high). Here is the supported documentation:
https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/how-to/configure-content-filters?tabs=python&pivots=ai-foundry-portal
If an application is directed towards children, then you should test with multiple test cases to ensure its safety, as Microsoft states that customers remain responsible for selecting an appropriate model for their use case and implementing appropriate measures. Here is the supported documentation:
https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/foundry-models-overview
Feel free to accept this as an answer.
Thankyou for reaching out to the Microsoft QNA Portal.