Ambient Audio Streaming WebSocket API reference

The Ambient Audio Streaming (AAS) 2.0 WebSocket API enables partners to stream ambient audio recordings in real time and submit them for downstream processing by Dragon Copilot.

The WebSocket transport is one of two streaming options (alongside gRPC). It exposes three endpoints:

Endpoint Type Description
GET /ws/retrieveConfiguration Unary Returns the service configuration for a given partner, including supported audio formats, locale settings, and operational limits.
GET /ws Bidirectional streaming Streams an ambient audio recording to the server. The client opens a session, streams audio data, and closes the recording.
GET /ws/startProcessing Unary Signals Dragon Copilot to begin processing a previously recorded ambient session.

Authentication

All WebSocket endpoints require bearer token authentication. Tokens can be passed via the Authorization header or the Sec-WebSocket-Protocol subprotocol header.

Supported token types:

  • S2S (Server-to-Server): Machine-to-machine token issued via MISE. After authentication, the service validates the calling application's identity against a configured allowlist.
  • Entra ID User Token: User-delegated token issued by Microsoft Entra ID.
  • EIS Bearer Token: JWT issued by the EHR Integration Service (EIS). See Token launch integration for details.

Required headers

Header Description
Authorization Bearer token (Bearer <token>). Alternatively, pass via Sec-WebSocket-Protocol.
customer-id Customer/environment identifier. Returns 403 Forbidden if missing.

Conditionally required headers

Header Condition Description
user-guid or external-user-id When using M2M (S2S) token At least one must be provided. Returns 403 Forbidden if both are missing.

Optional headers

Header Description
product-id Product identifier (used for license validation context).

Authentication methods

Method 1: Authorization header

GET /ws HTTP/1.1
Host: ambient-audio-service.copilot.us.dragon.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Version: 13
Authorization: Bearer <token>
customer-id: <customer-uuid>
external-user-id: <user-id>

Method 2: Sec-WebSocket-Protocol

The browser WebSocket API (new WebSocket(url, protocols)) does not allow setting custom headers like Authorization. To pass authentication data during the WebSocket handshake from browser-based clients, use the Sec-WebSocket-Protocol header as a comma-delimited key-value list.

The service supports two subprotocol formats:

Format A: Simple Bearer prefix

Pass the token directly with a Bearer prefix. Use this when you can set customer-id and other headers separately (for example, in non-browser environments that support subprotocols but not custom headers on upgrade):

GET /ws HTTP/1.1
Host: streaming.ambient-audio-service.copilot.us.dragon.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: Bearer <token>
customer-id: <customer-uuid>
external-user-id: <user-id>

Format B: Key-value list (browser SDK)

Pass the token and customer-id together in the subprotocol list. Use this when you cannot set any custom headers (for example, the browser WebSocket API):

GET /ws HTTP/1.1
Host: streaming.ambient-audio-service.copilot.us.dragon.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: sec-websocket-protocol, <token>, customer-id, <customer-uuid>

Parsing rules for Format B:

Position Value Description
0 sec-websocket-protocol Key marker. Also echoed in the server's response header for WebSocket handshake compliance.
1 <token> The raw JWT token (without Bearer prefix).
2 customer-id Key for customer/environment identifier.
3 <customer-uuid> Value of customer-id.
  • Values are comma-delimited and trimmed of whitespace.
  • The server locates keys by name (not strictly by position), then reads the next value as the associated value.
  • Only customer-id can be passed in the subprotocol list. The user-guid, external-user-id, and product-id headers must still be sent as separate HTTP headers on the upgrade request.

Note

In JavaScript browser clients, each value is passed as a separate subprotocol: new WebSocket(url, ["sec-websocket-protocol", token, "customer-id", customerId]). The browser serializes these as the comma-separated Sec-WebSocket-Protocol header.

Validation responses

Scenario Result
Missing valid token 401 Unauthorized
Missing customer-id 403 Forbidden
Missing both user-guid and external-user-id with M2M token 403 Forbidden
Failed license check 403 Forbidden
Valid token and headers WebSocket upgrade proceeds (101 Switching Protocols)

Message format

All text-based WebSocket messages use the following header format:

Path=<message-path>
X-MS-Request-Id=<guid-request-id>
X-Timestamp=<iso8601-utc-timestamp>

{JSON body}

Format rules:

  • Headers are separated by \r\n (CRLF)
  • Headers and body are separated by a blank line (\r\n\r\n)
  • Valid message paths: RetrieveConfiguration, RecordingOpen, RecordingClose, StartProcessing
  • X-MS-Request-Id must be a valid GUID
  • X-Timestamp must be ISO 8601 UTC format (for example, 2025-08-11T16:45:00.547Z)

Example message:

Path=RecordingOpen
X-MS-Request-Id=12345678-1234-1234-1234-123456789012
X-Timestamp=2025-08-11T16:45:00.547Z

{ "recordingId": "...", "dataFormat": {...}, ... }

Endpoints

GET /ws/retrieveConfiguration

Type: Unary (short-lived WebSocket connection)

Retrieves the service configuration for a given product, partner, and customer. The response includes supported audio locales and recording duration limits. Call this before starting a recording session.

Connection lifecycle

1. Client sends HTTP GET /ws/retrieveConfiguration with Upgrade header and Authorization token
2. Server validates token and accepts WebSocket upgrade (101 Switching Protocols)
3. Client sends a single text message with Path=RetrieveConfiguration and JSON body
4. Server returns the configuration response
5. Server closes WebSocket connection (1000 Normal Closure)

Request

Message path: RetrieveConfiguration

Field Type Required Description
productId string Yes The Microsoft unique identifier of the product.
partnerId string Yes The Microsoft unique identifier for the partner.
customerId string Yes The Microsoft unique identifier of the customer.
externalIdentifiers ExternalIdentifier[] No List of external identifiers. Use type "userId" for the partner's user identifier.

Example request:

Path=RetrieveConfiguration
X-MS-Request-Id=7a1223d0-272b-4f14-9f5c-e2ff5efd775e
X-Timestamp=2025-08-10T17:49:49.739Z

{
  "productId": "<product-guid>",
  "partnerId": "<partner-guid>",
  "customerId": "<customer-guid>",
  "externalIdentifiers": [
    { "type": "userId", "identifier": "<external-user-id>" }
  ]
}

Response

The server responds with a text message in the format RetrieveConfiguration: {json}.

Field Type Description
EncounterWarnSeconds uint32 Duration in seconds at which processing quality may degrade. Warn the user.
EncounterMaxSeconds uint32 Maximum duration in seconds of audio allowed. Stop recording at this limit.
SupportedRecordingLocales string[] Locales accepted for audio recording input (IETF BCP 47).
SupportedEncounterReportLocales string[] Locales available for encounter report output (IETF BCP 47).

Example response:

RetrieveConfiguration: {"EncounterWarnSeconds":2700,"EncounterMaxSeconds":4500,"SupportedRecordingLocales":["en-US","de-DE","es-US","fr-FR"],"SupportedEncounterReportLocales":["en-US","de-DE","es-US","fr-FR"]}

Note

Unary responses are prefixed with the message path and a colon (for example, RetrieveConfiguration: {...}). Response field names are PascalCase.

Errors

Scenario Result
Invalid or missing required fields Server closes with code 1011 (InternalServerError).
Unknown message path Server closes with code 2 (InvalidPayloadData), reason: "Unknown message path".
Failure communicating with downstream system Server closes with code 1011 (InternalServerError).
Non-WebSocket request HTTP 400.
Missing or invalid token HTTP 401.

GET /ws

Type: Bidirectional streaming (long-lived WebSocket connection)

Streams an ambient audio recording to the server. The connection supports three client message types:

  1. RecordingOpen (text) - Initialize the recording session.
  2. DataChunk (binary) - Stream audio data.
  3. RecordingClose (text) - End the recording.

The server responds with DataStorageResponse acknowledgments and a final RecordingCloseResponse.

Connection lifecycle

1. Client sends HTTP GET /ws with Upgrade header and Authorization token
2. Server validates token and accepts WebSocket upgrade (101 Switching Protocols)
3. Client sends RecordingOpen (text message) to initialize the recording session
4. Client sends DataChunk messages (binary) to stream audio data
5. Server sends DataStorageResponse (text) approximately every 10 KB of stored data
6. Client sends RecordingClose (text message) to end the recording
7. Server sends RecordingCloseResponse (text) confirming total bytes stored
8. Connection closes (1000 Normal Closure)

Message: RecordingOpen

Initializes the recording session. Must be the first message sent on the connection.

Message path: RecordingOpen
WebSocket message type: Text

Field Type Required Description
recordingId string Yes A caller-defined unique identifier for the recording.
dataFormat object Yes Audio encoding format. Must contain exactly one of: pcm, opus, webmOpus, or byteStream.
dataFormat.pcm object - Signed 16-bit LE PCM. Fields: sampleRateHz, bitcount, channels.
dataFormat.opus object - Ogg Opus. Fields: sampleRateHz.
dataFormat.webmOpus object - WebM Opus. Fields: sampleRateHz.
dataFormat.byteStream object - Opaque byte stream. Fields: formatSpecifier.
ambientSessionData object Yes Session metadata. Must include productId, partnerId, customerId, and correlationId.
actions string[] No AI actions to perform. Use "generate-draft" for draft generation. If omitted, only a transcript is generated.
reason string No Why the recording was started. Values: "ui", "wakeWord", "systemResume".
startingOffset uint32 No Byte offset for resuming after interruption. Set to last confirmed dataStored value.
previousEncounterSessions object[] No Previous sessions in the encounter. Each: sessionId, creationDate, sessionLengthSeconds.
outputFormIds string[] No Template form identifiers for note generation.

Example request:

Path=RecordingOpen
X-MS-Request-Id=12345678-1234-1234-1234-123456789012
X-Timestamp=2025-08-11T16:45:00.547Z

{
  "recordingId": "<recording-uuid>",
  "dataFormat": {
    "pcm": {
      "sampleRateHz": 16000,
      "bitcount": 16,
      "channels": 1
    }
  },
  "ambientSessionData": {
    "productId": "<product-guid>",
    "partnerId": "<partner-guid>",
    "customerId": "<customer-guid>",
    "correlationId": "<correlation-guid>",
    "practitionerInfo": {
      "externalIdentifiers": [
        { "type": "fhirId", "identifier": "<practitioner-fhir-id>" }
      ],
      "name": {
        "givenName": "Jane",
        "familyName": "Smith",
        "suffix": "MD"
      }
    },
    "externalIdentifiers": [
      { "type": "userId", "identifier": "<external-user-id>" }
    ],
    "creationDate": "2026-02-20T14:30:00.000Z",
    "localeInfo": {
      "recordingLocales": ["en-US"],
      "encounterReportLocale": "en-US",
      "encounterUxLocale": "en-US"
    }
  },
  "actions": ["generate-draft"],
  "reason": "ui",
  "startingOffset": 0
}

Note

RecordingOpen does not produce an immediate response. If the request is invalid, the server closes the connection with an error close code.

Message: DataChunk

Streams a chunk of audio data.

WebSocket message type: Binary

DataChunk messages are sent as binary WebSocket frames containing a JSON-serialized object. They do not use the Path=/X-MS-Request-Id=/X-Timestamp= header format.

Field Type Required Description
DataStart uint32 Yes Byte offset where this chunk begins within the recording.
Data byte[] Yes Raw audio bytes (base64-encoded in JSON representation).

Note

DataChunk uses PascalCase field names (DataStart, Data).

Example (binary frame content):

{
  "DataStart": 0,
  "Data": "<base64-encoded-audio-bytes>"
}

Server response - DataStorageResponse (text message):

Sent approximately every 10 KB of stored data. Not returned for every DataChunk.

{ "dataStored": { "dataStored": 32768 } }

Note

Streaming responses (DataStorageResponse, RecordingCloseResponse) use camelCase field names because they are serialized using protobuf's JSON formatter. This differs from unary responses (RetrieveConfiguration, StartProcessing), which use PascalCase because they are serialized using Newtonsoft.Json.

Note

When streaming resumes after an interruption, the first DataStorageResponse contains the byte position stored prior to the interruption. Duplicate bytes are silently discarded.

Message: RecordingClose

Ends the recording.

Message path: RecordingClose
WebSocket message type: Text

Field Type Required Description
recordingId string Yes Must match the recordingId from RecordingOpen.
recordingLengthSeconds uint32 Yes Total recording duration in seconds.
reason string No Why the recording was stopped. Values: "ui", "voiceCommand", "btDisconnected", "externalInterruption", "unexpectedError", "maxDurationExceeded".

Example request:

Path=RecordingClose
X-MS-Request-Id=12345678-1234-1234-1234-123456789013
X-Timestamp=2025-08-11T17:17:07.812Z

{
  "recordingId": "<recording-uuid>",
  "recordingLengthSeconds": 120,
  "reason": "ui"
}

Server response - RecordingCloseResponse (text message):

{ "recordingCloses": { "dataStored": 65536 } }

Streaming errors

Scenario WebSocket close code Description
Normal completion 1000 (NormalClosure) Graceful close after RecordingClose.
Invalid message body 2 (InvalidPayloadData) Cannot parse message as expected format.
Unknown message path 2 (InvalidPayloadData) Message path not recognized.
Protocol error 1002 (ProtocolError) Malformed message format.
Internal write failure 1011 (InternalServerError) "Resource exhausted please try again later."
Unexpected server exception 1011 (InternalServerError) Server error during processing.
Negative startingOffset 2 (InvalidPayloadData) "StartingOffset cannot be negative".

GET /ws/startProcessing

Type: Unary (short-lived WebSocket connection)

Signals Dragon Copilot to begin processing a previously recorded ambient session. Processing happens asynchronously; the response confirms the request was accepted.

Connection lifecycle

1. Client sends HTTP GET /ws/startProcessing with Upgrade header and Authorization token
2. Server validates token and accepts WebSocket upgrade (101 Switching Protocols)
3. Client sends a single text message with Path=StartProcessing and JSON body
4. Server returns the processing response
5. Server closes WebSocket connection (1000 Normal Closure)

Request

Message path: StartProcessing

Field Type Required Description
ambientSessionData object Yes Session metadata. Must include productId, partnerId, customerId, and correlationId.
actions string[] Yes AI actions to perform. Must not be empty. Use "generate-draft" for draft generation.
requestTime string (ISO 8601) No Timestamp when the user triggered processing.
recordingsToProcess string[] No Recording IDs to include. If omitted, all recordings for the session are processed.

Example request:

Path=StartProcessing
X-MS-Request-Id=516c8e09-f9c2-4ff2-8bc9-9d4e89f921fb
X-Timestamp=2025-08-11T19:29:27.608Z

{
  "ambientSessionData": {
    "productId": "<product-guid>",
    "partnerId": "<partner-guid>",
    "customerId": "<customer-guid>",
    "correlationId": "<correlation-guid>",
    "practitionerInfo": {
      "externalIdentifiers": [
        { "type": "fhirId", "identifier": "<practitioner-fhir-id>" }
      ],
      "name": {
        "givenName": "Jane",
        "familyName": "Smith",
        "suffix": "MD"
      }
    },
    "externalIdentifiers": [
      { "type": "userId", "identifier": "<external-user-id>" }
    ]
  },
  "actions": ["generate-draft"],
  "recordingsToProcess": ["<recording-uuid-1>", "<recording-uuid-2>"]
}

Response

The server responds with a text message in the format StartProcessing: {json}.

Field Type Description
StreamingResponse.ErrorCode uint32 0 indicates success. Non-zero indicates an error.
StreamingResponse.ErrorMessage string Human-readable message.
StreamingResponse.DetailedErrorInformation string Additional diagnostic information.

Success response:

StartProcessing: {"StreamingResponse":{"ErrorCode":0,"ErrorMessage":"","DetailedErrorInformation":""}}

Error response:

StartProcessing: {"StreamingResponse":{"ErrorCode":1,"ErrorMessage":"Processing failed","DetailedErrorInformation":"Session not found"}}

Note

Unary responses are prefixed with the message path and a colon. Response field names are PascalCase.

Usage notes

  • actions is required and must not be empty. Use "generate-draft" to trigger AI draft generation.
  • recordingsToProcess is optional. When omitted, all recordings for the session are processed.
  • Processing is asynchronous: the response confirms the request was accepted, not that processing is complete.

Errors

Scenario Result
Missing required fields Server closes with code 1011 (InternalServerError).
Invalid request body Server closes with code 1011 (InternalServerError).
Unknown message path Server closes with code 2 (InvalidPayloadData), reason: "Unknown message path".
Downstream processing failure ErrorCode is non-zero in the response body.
Non-WebSocket request HTTP 400.
Missing or invalid token HTTP 401.

Common types

AmbientSession fields

Field Type Required Description
productId string Yes Microsoft unique identifier of the product.
partnerId string Yes Microsoft unique identifier for the partner.
customerId string Yes Microsoft unique identifier of the customer.
correlationId string Yes Partner-assigned unique identifier of the session (GUID). Used to correlate results.
practitionerInfo object No Practitioner metadata (identifiers, name, specialty).
ehrInstanceId string No EHR instance identifier.
externalIdentifiers ExternalIdentifier[] No External identifiers. Use type "userId" for the partner's user identifier.
creationDate string (ISO 8601) No Session creation timestamp.
dstOffsetSeconds int32 No DST offset in seconds.
clientInfo object No Client application metadata (applicationId, applicationVersion, sdkVersion, deviceId, deviceType).
localeInfo object No Locale preferences: recordingLocales, encounterReportLocale, encounterUxLocale.

ExternalIdentifier

Field Type Description
type string Identifier type (for example, "userId", "fhirId", "npi", "encounterId").
identifier string The identifier value.

Best practices

  1. Call retrieveConfiguration before recording to confirm supported locales and duration limits.
  2. Implement reconnection logic with exponential backoff for the bidirectional streaming endpoint.
  3. Track dataStored values to set startingOffset correctly when resuming after interruption.
  4. Include previousEncounterSessions when splitting recordings across multiple sessions.
  5. Call startProcessing after closing the recording. It requires a separate WebSocket connection.
  6. Monitor WebSocket close codes to distinguish normal closure from errors.
  7. Handle backpressure - don't send audio faster than the server can process.
  8. Initiate graceful shutdown - send a close frame rather than abruptly disconnecting.