Share via


Transparency whitepaper

What is a Transparency Note?

An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it is deployed. Creating a system that is fit for its intended purpose requires an understanding of how the technology works, what its capabilities and limitations are, and how to achieve the best performance. Microsoft’s Transparency Notes are intended to help you understand how our AI technology works, the choices system owners can make that influence system performance and behavior, and the importance of thinking about the whole system, including the technology, the people, and the environment. You can use Transparency Notes when developing or deploying your own system, or share them with the people who will use or be affected by your system.

Microsoft's transparency notes are part of a broader effort at Microsoft to put our AI principles into practice.

For more information, see:

The basics of Microsoft Dragon Copilot

Dragon Copilot is a cloud‑based, AI‑powered documentation platform that helps clinicians create high‑quality draft medical notes more efficiently.

The system ingests speech audio (live dictation or ambient conversation), typed or pasted text, and natural‑language prompts. It returns structured outputs such as high‑accuracy speech‑to‑text transcriptions, draft clinical notes, and generative summaries or transformations (e.g., referral letters, after‑visit summaries).

Dragon Copilot is designed to augment, never replace, clinical expertise. All generated content must be reviewed and validated by the clinician before it becomes part of the medical record.

Key terms

Term Definition
Dragon Copilot The end‑to‑end Microsoft solution for AI‑assisted clinical documentation and related workflows.
Automatic Speech Recognition (ASR) A speech technology that can be designed to, not only, convert spoken language into text but also to interpret commands and execute actions based on the recognized speech. ASR is more general than STT and includes STT.
Speech-to-text (STT) A speech recognition component that converts speech recordings into text. This may include automatic punctuation and, in the case of multi-speaker conversational speech, speaker tagging aka diarization (who said what).
Ambient note generation Capability that records a natural clinician–patient dialogue and produces a draft clinical note summarizing the encounter.
Generative AI Large‑language‑model (LLM) functionality that can create, transform, or summarize clinical content when triggered by templates or free‑form prompts.
Customer data Any data provided by, or on behalf of, the healthcare organization, including encounter recordings, transcriptions, clinical notes, orders, patient context, identifiers, and clinician inputs, processed by Dragon Copilot.
Preview feature Functionality that is released for trial or early evaluation and may exhibit higher variability in accuracy or robustness.
Content filter Azure OpenAI Service classifiers that detect and block disallowed content (violence, hate, self‑harm, sexual) in prompts or outputs.

Capabilities

System behavior

Dragon Copilot is a software solution to support clinical documentation and workflows with artificial intelligence (AI). The AI systems in Dragon Copilot fall into three major categories, each providing distinctive value: front end speech-to-text, ambient note generation, and generative AI. Where activated, each system is integrated within Dragon Copilot to deliver a single experience.

  1. Speech-to-text (STT) enables users to record their voice and receive dictations as highly accurate text. This area of capability also includes the ability to recognize spoken commands that affect a variety of functions, like inserting predefined content into a text field.

  2. Ambient note generation describes the ability to record a natural human conversation, typically between a clinician and a patient, and receive a summary of that conversation as clinical content. This capability enables Dragon Copilot to deliver clinical notes to users based on conversations with patients.

  3. Generative AI capabilities extend user value by supporting a variety of additional use cases and interactions.

    1. Predefined AI capabilities may be selected by users to run on demand. Some may also be configured or run automatically in certain circumstances. Note that not all capabilities are provided in all markets or they may vary by market. Example predefined capabilities include:

      • Apply style: Applies your defined style preferences automatically.

      • Change pronouns: Changes the pronouns throughout the summary.

      • Get coaching:** Provides you with feedback on what you can do differently to get better AI content.

      • Summarize encounter: Summarizes the encounter and key takeaways.

      • Summarize evidence: Produces rich insights about diagnoses in the Assessment & Plan with referenced evidence from the transcript and draft note.

      • Draft after-visit summary: Drafts a patient-friendly summary that provides details about visit education and instructions for patients and care givers.

      • Draft referral letter: Drafts a starter referral letter for a physician or clinic.

    2. The freeform prompt interface (subject to availability) extends users' ability to submit prompts to create, edit, or summarize content or ask questions.

The social context of clinical care means accuracy, privacy, and clinician oversight are paramount. The system therefore:

  • Requires patient consent (per organization policy) for recording.

  • Includes a timeline of activities and versioning so clinicians can rapidly verify critical sections and changes.

  • Employs healthcare-adapted AI modeling and specialized content filters to minimize accidental blocking of clinically necessary terms while still preventing disallowed content.

Potential Responsible AI concerns include over‑reliance on autogenerated text (“automation bias”), privacy of voice data, and fairness when models underperform on dialects or accents.

Use cases

For information on intended use cases and unsupported use cases, see: Use cases.

Limitations

Dragon Copilot is an assistive technology that creates draft medical documentation. Its effectiveness depends on clear audio, accurate speaker identification, and clinician oversight. Key limitations include:

  • Potential transcription errors due to background noise, overlapping speech, or accented speech not well represented in training data.

  • Summarization inaccuracies such as omitted findings or hallucinated details, especially in preview markets or languages.

  • Latency sensitivity in low‑bandwidth environments (>250 ms may affect real‑time dictation experience).

  • No collaborative editing – designed for a single authenticated user per documentation session.

  • Preview features may change, disappear, or have lower reliability.

When considering the appropriate ways to use Dragon Copilot, it is helpful to focus on several key areas: customization, installation, user experience design, data gathering, and training. Each of these areas plays a crucial role in ensuring the system's effectiveness and reliability in real-world scenarios.

Customization

Customization is vital for tailoring Dragon Copilot to meet specific institutional and clinician needs and preferences. Users can customize templates, prompts, and style preferences to fit their documentation requirements. For example, users can create and save prompts to the library, apply customizable templates, and configure style preferences to ensure that the generated notes align with their desired format. This level of customization allows for a more personalized and efficient documentation process.

Installation, configuration, and roll-out

Proper installation is crucial for the smooth operation of Dragon Copilot. Ensure that all prerequisites are met, such as having the necessary software versions and integrations in place, as well as appropriate devices (microphones, mobile devices). Additionally, for optimal performance, any system settings recommended by Dragon Copilot's documentation should be configured at both the administrative and user levels.

In addition to training and careful technical processes, clear and dedicated project management, including user selection and support, helps ensure Dragon Copilot effectiveness.

User experience design and accessibility

User experience design focuses on making Dragon Copilot intuitive and user-friendly. Users can choose between different views of the Dragon Copilot desktop app, such as standard, narrow, and compact views, to suit their workflow. Providing clear navigation options, easy access to prompts, and the ability to review and edit notes efficiently are critical aspects of user experience design.

Note

Microsoft is deeply committed to ensuring accessibility in all its products, including Dragon Copilot. Our goal is to create inclusive technology that empowers everyone, regardless of their abilities. Dragon Copilot is designed with accessibility in mind to ensure that all users can benefit from its advanced features. We strive to provide a seamless experience for users with disabilities by incorporating features such as voice commands, customizable views, and intuitive navigation.

The minimum standards for accessibility in Dragon Copilot include compliance with the Web Content Accessibility Guidelines (WCAG) 2.1, ensuring that the product is perceivable, operable, understandable, and robust for all users. Additionally, Dragon Copilot supports assistive technologies such as screen readers and voice recognition software, making it accessible to users with visual, auditory, and motor impairments. We continuously work to improve accessibility and welcome feedback from our users to ensure that Dragon Copilot meets their needs and expectations.

Data gathering

Effective data gathering is essential for training and improving the AI models used by Dragon Copilot. Collecting high-quality, diverse, and representative data from various clinical scenarios helps ensure that the system can accurately transcribe and generate notes. Users should follow their organizational policies and applicable regulations for acquiring consent from recorded subjects.

Having access to a comprehensive data set that includes different specialties and patient interactions can improve the system's accuracy and reliability when such a data set is used for AI training and quality assurance purposes. Data is used to benefit customers in the following ways:

  • Improving AI by refining on more representative data from Dragon Copilot.

  • Preventing AI model drift.

  • Taking advantage of and refining new and improved generic LLM and ASR technology.

  • Expand feature functionality like note target changes, specialty optimizations, orders, and additional language support.

Training

Training users on how to use Dragon Copilot effectively is crucial for maximizing its benefits. Providing comprehensive training sessions, user guides, and support resources can help users understand the system's capabilities, limitations, and best practices. Regular training updates and feedback sessions can also address any issues and ensure that users are confident in using the system. In-depth training and support materials are provided within the Dragon Copilot app.

Technical limitations, operational factors and ranges

Factor Expected Reliable Range Notes
Ambient diarized speakers ≤ 2 One speaker (the clinician) is identified as primary. All other speakers are aggregated.
Aggregate ambient recordings 75 minutes Recording lengths over 75 minutes are not supported.
Supported languages See product documentation Unsupported languages will not be used directly in note production.
Network latency < 200 ms (STT) Higher latency may delay text streaming.

System performance

Dragon Copilot AI performance is measured in multiple ways, including:

  • Word Error Rate (WER) for STT.

  • Accuracy score for ambient note generation and engineered generative AI measured against standard data sets.

  • Latency for AI processes.

Additional standard measures apply to system performance, like service uptime.

Best practices for improving system performance

  • Use as directed according to Dragon Copilot's primary and intended use cases.

  • Optimize device or microphone selection and placement to get the highest quality audio signal possible for the recording type (directed speech or ambient recording).

  • Use versioning and the transcript as backup reference tools.

  • Understand and use style and configuration capabilities to tailor output.

  • Revise and retry prompts using best practices when using generative AI prompts.

Evaluation of Dragon Copilot

Each system was carefully designed and built using detailed clinical content specifications, evaluated through batteries of quantitative and qualitative tests, red-teamed for content and security vulnerabilities, and reviewed by clinical content subject matter experts for accuracy and usability. New AI models are deployed in production after successful AB tests that compare new models with baseline models. Characteristics and performance were also evaluated against Responsible AI principles. Dragon Copilot also supports robust user feedback mechanisms that are used for continuous improvement.

Evaluating and integrating Dragon Copilot for your use

When adopting Dragon Copilot, we suggest the following:

  1. Pilot with representative data. Record a diverse sample of encounters for review. Seek feedback from users across multiple specialties.

  2. Design human oversight. Ensure clinicians understand they remain responsible for the final record. The UI should indicate AI‑generated sections distinctly and provide one‑click access to the source transcript.

  3. Document consent procedures. Incorporate a patient consent step before recording as appropriate for your organization.

Customer disclaimer

Neither Dragon Copilot nor the Azure OpenAI API are intended, designed, or made available to be: (1) a medical device, or (2) a substitute for professional medical advice, diagnosis, treatment, or judgment and should not be used to replace or as a substitute for professional medical advice, diagnosis, treatment, or judgment.

Output from Dragon Copilot or the Azure OpenAI API does not reflect the opinions of Microsoft. The accuracy and reliability of the information provided by Dragon Copilot or the Azure OpenAI API may vary and are not guaranteed. Microsoft disclaims any liability for any damages resulting from the use or reliance on the information provided by Dragon Copilot or the Azure OpenAI API.

Medical device disclaimer

Microsoft products are not designed, intended, or made available as medical device(s), and are not designed or intended to be a substitute for professional medical advice, diagnosis, treatment, or judgment and should not be used to replace or as a substitute for professional medical advice, diagnosis, treatment or judgment. Microsoft does not warrant that Microsoft products will be sufficient for any medical purposes or meet the health or medical requirements of any person.

Responsible AI use in Dragon Copilot

At Microsoft, we are deeply committed to developing and deploying AI solutions that align with Microsoft's Responsible AI principles. Our approach ensures that our AI technologies are not only innovative but also ethical, transparent, and trustworthy. Below, we outline how our solution adheres to each of Microsoft's Responsible AI principles: Fairness, Reliability & Safety, Privacy & Security, Inclusiveness, Transparency, and Accountability.

Microsoft Dragon Copilot leverages multiple operations, involving both a red-teaming process and continuous testing routines:

  1. Internal Responsible AI guardrails tools: Before submitting prompts to the Large Language Model (LLM), we utilize several tools:

    • Guardlist: Flags specific words and phrases that fall into harmful categories such as hate speech, sexual content, violence, self-harm, protected material, and user prompt injection attacks.

    • Azure Content Safety: Provides scores against categories of harm related to these words and phrases.

    • Dynamic Blocklist: Enables dynamic updates of new phrases identified as harmful.

    • Allow list: Specifically created for the Dragon Copilot workflow to allow words flagged by Guardlist that are deemed necessary.

  2. Intent Classification Module: This module ensures that incoming prompts are within the scope of accepted intents and rejects prompts that are irrelevant (e.g., non-medical related, related to medical advice, or clinical decision-making). This module contributes to mitigating any specific prompt-injection in harmful ways.

  3. Context Grounding: The context provided to the LLM is grounded in specific note data. External sources and data from other sessions are not allowed.

  4. Post-response Responsible AI checks: The same Responsible AI guardrail routines are run after capturing the LLM response to ensure that the reply adheres to Responsible AI policies.

  5. Prompt Injection checks: These checks ensure that the LLM does not output content related to our internal processes, such as system prompts and guardrail checks.

  6. Thorough Testing Before Feature Release: Before each new AI-process feature release, Dragon Copilot undergoes a rigorous testing process, including:

    • Functional/Integration Testing: Ensures that the end-to-end components workflow works correctly.

    • Accuracy Evaluation of Responses: Based on multiple criteria, including:

      • Locale-specific output evaluation for global readiness.

      • Context-based tests using anonymized clinical notes and transcripts.

      • Evaluation metrics such as Groundedness, Completeness, Relevance, and Factuality.

    • Response Scoring: Responses are scored on a scale of 1 to 5, with higher impact issues on patient care receiving lower scores.

    • Automated Tests: Running on Azure pipelines.

    • Load Testing: For performance benchmarking.

By adhering to these rigorous processes and principles, Dragon Copilot ensures that our AI solutions are not only cutting-edge but also align with the highest standards of ethical AI development. This commitment to responsible AI fosters trust and reliability in our technologies, ensuring they are safe and beneficial for all users.

For more information on Microsot RAI in general, see: