Upravit

Sdílet prostřednictvím


Developing Responsible Generative AI Applications and Features on Windows

This document provides an overview of recommended responsible development practices to use as you create applications and features on Windows with generative artificial intelligence.

Guidelines for responsible development of generative AI apps and features on Windows

Every team at Microsoft follows core principles and practices to responsibly build and ship AI, including Windows. You can read more about Microsoft’s approach to responsible development in the first annual Responsible AI Transparency Report. Windows follows foundational pillars of RAI development — govern, map, measure, and manage — that are aligned to the National Institute for Standards and Technology (NIST) AI Risk Management Framework.

Govern - Policies, practices, and processes

Standards are the foundation of governance and compliance processes. Microsoft has developed our own Responsible AI Standard, including six principles that you can use as a starting point to develop your guidelines for responsible AI. We recommend you build AI principles into your development lifecycle end to end, as well as into your processes and workflows for compliance with laws and regulations across privacy, security, and responsible AI. This spans from early assessment of each AI feature, using tools like the AI Fairness Checklist and Guidelines for Human-AI Interaction - Microsoft Research, to monitoring and review of AI benchmarks, testing and processes using tools like a Responsible AI scorecard, to public documentation into your AI features’ capabilities and limitations and user disclosure and controls -- notice, consent, data collection and processing information, etc. -- in keeping with applicable privacy laws, regulatory requirements, and policies.

Map - Identify risk

Recommended practices for identifying risks include:

End-to-end testing

  • Red-teaming: The term red teaming has historically described systematic adversarial attacks for testing security vulnerabilities. With the rise of large language models (LLMs), the term has extended beyond traditional cybersecurity and evolved in common usage to describe many kinds of probing, testing, and attacking of AI systems. With LLMs, both benign and adversarial usage can produce potentially harmful outputs, which can take many forms, including harmful content such as hate speech, incitement or glorification of violence, or sexual content.

  • Model evaluation: In addition to testing end-to-end, it is also important to evaluate the model itself.

    • Model Card: For publicly available models, such as those on HuggingFace, you can check each model’s Model Card as a handy reference to understand if a model is the right one for your use case. Read more about Model Cards.

    • Manual testing: Humans performing step-by-step tests without scripts is an important component of model evaluation that supports...

      • Measuring progress on a small set of priority issues. When mitigating specific harms, it's often most productive to keep manually checking progress against a small dataset until the harm is no longer observed before moving to automated measurement.

      • Defining and reporting metrics until automated measurement is reliable enough to use alone.

      • Spot-checking periodically to measure the quality of automatic measurement.

    • Automated testing: Automatically executed testing is also an important component of model evaluation that supports...

      • Measuring at a large scale with increased coverage to provide more comprehensive results.

      • Ongoing measurement to monitor for any regression as the system, usage, and mitigations evolve.

    • Model selection: Select a model that is suited for your purpose and educate yourself to understand its capabilities, limitations, and potential safety challenges. When testing your model, make sure that it produces results appropriate for your use. To get you started, destinations for Microsoft (and non-Microsoft/open source) model sources include:

Measure - Assess risks and mitigation

Recommended practices include:

  • Assign a Content Moderator: Content Moderator checks text, image, and video content for material that is potentially offensive, risky, or otherwise undesirable in content. Learn more: Introduction to Content Moderator (Microsoft Learn Training).

    • Use content safety filters: This ensemble of multi-class classification models detects four categories of harmful content (violence, hate, sexual, and self-harm) at four severity levels respectively (safe, low, medium, and high). Learn more: How to configure content filters with Azure OpenAI Service.

    • Apply a meta-prompt: A meta-prompt is a system message included at the beginning of the prompt and is used to prime the model with context, instructions, or other information relevant to your use case. These instructions are used to guide the model’s behavior. Learn more: Creating effective security guardrails with metaprompt / system message engineering.

    • Utilize blocklists: This blocks the use of certain terms or patterns in a prompt. Learn more: Use a blocklist in Azure OpenAI.

    • Get familiar with the provenance of the model: Provenance is the history of ownership of a model, or the who-what-where-when, and is very important to understand. Who collected the data in a model? Who does the data pertain to? What kind of data is used? Where was the data collected? When was the data collected? Knowing where model data came from can help you assess its quality, reliability, and avoid any unethical, unfair, biased, or inaccurate data use.

    • Use a standard pipeline: Use one content moderation pipeline rather than pulling together parts piecemeal. Learn more: Understanding machine learning pipelines.

  • Apply UI mitigations: These provide important clarity to your user about capabilities and limitations of an AI-based feature. To help users and provide transparency about your feature, you can:

    • Encourage users to edit outputs before accepting them

    • Highlight potential inaccuracies in AI outputs

    • Disclose AI’s role in the interaction

    • Cite references and sources

    • Limit length of input and output where appropriate

    • Provide structure out input or output – prompts must follow a standard format

    • Prepare pre-determined responses for controversial prompts.

Manage - Mitigate AI risks

Recommendations for mitigating AI risks include:

  • Abuse monitoring: This methodology detects and mitigates instances of recurring content and/or behaviors that suggest a service has been used in a manner that may violate the Code of Conduct or other applicable product terms. Learn more: Abuse Monitoring.

  • Phased delivery: Roll out your AI solution slowly to handle incoming reports and concerns.

  • Incident response plan: For every high-priority risk, evaluate what will happen and how long it will take to respond to an incident, and what the response process will look like.

  • Ability to turn feature or system off: Provide functionality to turn the feature off if an incident is about to or has occurred that requires pausing the functionality to avoid further harm.

  • User access controls/blocking: Develop a way to block users who are misusing a system.

  • User feedback mechanism: Streams to detect issues from the user’s side.

  • Responsible deployment of telemetry data: Identify, collect, and monitor signals that indicate user satisfaction or their ability to use the system as intended, ensuring you follow applicable privacy laws, policies, and commitments. Use telemetry data to identify gaps and improve the system.

Tools and resources