Ambient Audio Streaming WebSocket API reference

The Ambient Audio Streaming (AAS) 2.0 WebSocket API enables partners to stream ambient audio recordings in real time and submit them for downstream processing by Dragon Copilot.

The WebSocket transport is one of two streaming options (alongside gRPC). It exposes three endpoints:

Endpoint	Type	Description
GET /ws/retrieveConfiguration	Unary	Returns the service configuration for a given partner, including supported audio formats, locale settings, and operational limits.
GET /ws	Bidirectional streaming	Streams an ambient audio recording to the server. The client opens a session, streams audio data, and closes the recording.
GET /ws/startProcessing	Unary	Signals Dragon Copilot to begin processing a previously recorded ambient session.

Authentication

All WebSocket endpoints require bearer token authentication. Tokens can be passed via the Authorization header or the Sec-WebSocket-Protocol subprotocol header.

Supported token types:

S2S (Server-to-Server): Machine-to-machine token issued via MISE. After authentication, the service validates the calling application's identity against a configured allowlist.
Entra ID User Token: User-delegated token issued by Microsoft Entra ID.
EIS Bearer Token: JWT issued by the EHR Integration Service (EIS). See Token launch integration for details.

Required headers

Header	Description
`Authorization`	Bearer token (`Bearer <token>`). Alternatively, pass via `Sec-WebSocket-Protocol`.
`customer-id`	Customer/environment identifier. Returns `403 Forbidden` if missing.

Conditionally required headers

Header	Condition	Description
`user-guid` or `external-user-id`	When using M2M (S2S) token	At least one must be provided. Returns `403 Forbidden` if both are missing.

Optional headers

Header	Description
`product-id`	Product identifier (used for license validation context).

Authentication methods

Method 1: Authorization header

GET /ws HTTP/1.1
Host: ambient-audio-service.copilot.us.dragon.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Version: 13
Authorization: Bearer <token>
customer-id: <customer-uuid>
external-user-id: <user-id>

Method 2: Sec-WebSocket-Protocol

The browser WebSocket API (new WebSocket(url, protocols)) does not allow setting custom headers like Authorization. To pass authentication data during the WebSocket handshake from browser-based clients, use the Sec-WebSocket-Protocol header as a comma-delimited key-value list.

The service supports two subprotocol formats:

Format A: Simple Bearer prefix

Pass the token directly with a Bearer prefix. Use this when you can set customer-id and other headers separately (for example, in non-browser environments that support subprotocols but not custom headers on upgrade):

GET /ws HTTP/1.1
Host: streaming.ambient-audio-service.copilot.us.dragon.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: Bearer <token>
customer-id: <customer-uuid>
external-user-id: <user-id>

Format B: Key-value list (browser SDK)

Pass the token and customer-id together in the subprotocol list. Use this when you cannot set any custom headers (for example, the browser WebSocket API):

GET /ws HTTP/1.1
Host: streaming.ambient-audio-service.copilot.us.dragon.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: sec-websocket-protocol, <token>, customer-id, <customer-uuid>

Parsing rules for Format B:

Position	Value	Description
0	`sec-websocket-protocol`	Key marker. Also echoed in the server's response header for WebSocket handshake compliance.
1	`<token>`	The raw JWT token (without `Bearer` prefix).
2	`customer-id`	Key for customer/environment identifier.
3	`<customer-uuid>`	Value of customer-id.

Values are comma-delimited and trimmed of whitespace.
The server locates keys by name (not strictly by position), then reads the next value as the associated value.
Only customer-id can be passed in the subprotocol list. The user-guid, external-user-id, and product-id headers must still be sent as separate HTTP headers on the upgrade request.

Note

In JavaScript browser clients, each value is passed as a separate subprotocol: new WebSocket(url, ["sec-websocket-protocol", token, "customer-id", customerId]). The browser serializes these as the comma-separated Sec-WebSocket-Protocol header.

Validation responses

Scenario	Result
Missing valid token	`401 Unauthorized`
Missing `customer-id`	`403 Forbidden`
Missing both `user-guid` and `external-user-id` with M2M token	`403 Forbidden`
Failed license check	`403 Forbidden`
Valid token and headers	WebSocket upgrade proceeds (`101 Switching Protocols`)

Message format

All text-based WebSocket messages use the following header format:

Path=<message-path>
X-MS-Request-Id=<guid-request-id>
X-Timestamp=<iso8601-utc-timestamp>

{JSON body}

Format rules:

Headers are separated by \r\n (CRLF)
Headers and body are separated by a blank line (\r\n\r\n)
Valid message paths: RetrieveConfiguration, RecordingOpen, RecordingClose, StartProcessing
X-MS-Request-Id must be a valid GUID
X-Timestamp must be ISO 8601 UTC format (for example, 2025-08-11T16:45:00.547Z)

Example message:

Path=RecordingOpen
X-MS-Request-Id=12345678-1234-1234-1234-123456789012
X-Timestamp=2025-08-11T16:45:00.547Z

{ "recordingId": "...", "dataFormat": {...}, ... }

Endpoints

GET /ws/retrieveConfiguration

Type: Unary (short-lived WebSocket connection)

Retrieves the service configuration for a given product, partner, and customer. The response includes supported audio locales and recording duration limits. Call this before starting a recording session.

Connection lifecycle

1. Client sends HTTP GET /ws/retrieveConfiguration with Upgrade header and Authorization token
2. Server validates token and accepts WebSocket upgrade (101 Switching Protocols)
3. Client sends a single text message with Path=RetrieveConfiguration and JSON body
4. Server returns the configuration response
5. Server closes WebSocket connection (1000 Normal Closure)

Request

Message path: RetrieveConfiguration

Field	Type	Required	Description
`productId`	string	Yes	The Microsoft unique identifier of the product.
`partnerId`	string	Yes	The Microsoft unique identifier for the partner.
`customerId`	string	Yes	The Microsoft unique identifier of the customer.
`externalIdentifiers`	ExternalIdentifier[]	No	List of external identifiers. Use type `"userId"` for the partner's user identifier.

Example request:

Path=RetrieveConfiguration
X-MS-Request-Id=7a1223d0-272b-4f14-9f5c-e2ff5efd775e
X-Timestamp=2025-08-10T17:49:49.739Z

{
  "productId": "<product-guid>",
  "partnerId": "<partner-guid>",
  "customerId": "<customer-guid>",
  "externalIdentifiers": [
    { "type": "userId", "identifier": "<external-user-id>" }
  ]
}

Response

The server responds with a text message in the format RetrieveConfiguration: {json}.

Field	Type	Description
`EncounterWarnSeconds`	uint32	Duration in seconds at which processing quality may degrade. Warn the user.
`EncounterMaxSeconds`	uint32	Maximum duration in seconds of audio allowed. Stop recording at this limit.
`SupportedRecordingLocales`	string[]	Locales accepted for audio recording input (IETF BCP 47).
`SupportedEncounterReportLocales`	string[]	Locales available for encounter report output (IETF BCP 47).

Example response:

RetrieveConfiguration: {"EncounterWarnSeconds":2700,"EncounterMaxSeconds":4500,"SupportedRecordingLocales":["en-US","de-DE","es-US","fr-FR"],"SupportedEncounterReportLocales":["en-US","de-DE","es-US","fr-FR"]}

Note

Unary responses are prefixed with the message path and a colon (for example, RetrieveConfiguration: {...}). Response field names are PascalCase.

Errors

Scenario	Result
Invalid or missing required fields	Server closes with code 1011 (InternalServerError).
Unknown message path	Server closes with code 2 (InvalidPayloadData), reason: `"Unknown message path"`.
Failure communicating with downstream system	Server closes with code 1011 (InternalServerError).
Non-WebSocket request	HTTP 400.
Missing or invalid token	HTTP 401.

GET /ws

Type: Bidirectional streaming (long-lived WebSocket connection)

Streams an ambient audio recording to the server. The connection supports three client message types:

RecordingOpen (text) - Initialize the recording session.
DataChunk (binary) - Stream audio data.
RecordingClose (text) - End the recording.

The server responds with DataStorageResponse acknowledgments and a final RecordingCloseResponse.

Connection lifecycle

1. Client sends HTTP GET /ws with Upgrade header and Authorization token
2. Server validates token and accepts WebSocket upgrade (101 Switching Protocols)
3. Client sends RecordingOpen (text message) to initialize the recording session
4. Client sends DataChunk messages (binary) to stream audio data
5. Server sends DataStorageResponse (text) approximately every 10 KB of stored data
6. Client sends RecordingClose (text message) to end the recording
7. Server sends RecordingCloseResponse (text) confirming total bytes stored
8. Connection closes (1000 Normal Closure)

Message: RecordingOpen

Initializes the recording session. Must be the first message sent on the connection.

Message path: RecordingOpen
WebSocket message type: Text

Field	Type	Required	Description
`recordingId`	string	Yes	A caller-defined unique identifier for the recording.
`dataFormat`	object	Yes	Audio encoding format. Must contain exactly one of: `pcm`, `opus`, `webmOpus`, or `byteStream`.
`dataFormat.pcm`	object	-	Signed 16-bit LE PCM. Fields: `sampleRateHz`, `bitcount`, `channels`.
`dataFormat.opus`	object	-	Ogg Opus. Fields: `sampleRateHz`.
`dataFormat.webmOpus`	object	-	WebM Opus. Fields: `sampleRateHz`.
`dataFormat.byteStream`	object	-	Opaque byte stream. Fields: `formatSpecifier`.
`ambientSessionData`	object	Yes	Session metadata. Must include `productId`, `partnerId`, `customerId`, and `correlationId`.
`actions`	string[]	No	AI actions to perform. Use `"generate-draft"` for draft generation. If omitted, only a transcript is generated.
`reason`	string	No	Why the recording was started. Values: `"ui"`, `"wakeWord"`, `"systemResume"`.
`startingOffset`	uint32	No	Byte offset for resuming after interruption. Set to last confirmed `dataStored` value.
`previousEncounterSessions`	object[]	No	Previous sessions in the encounter. Each: `sessionId`, `creationDate`, `sessionLengthSeconds`.
`outputFormIds`	string[]	No	Template form identifiers for note generation.

Example request:

Path=RecordingOpen
X-MS-Request-Id=12345678-1234-1234-1234-123456789012
X-Timestamp=2025-08-11T16:45:00.547Z

{
  "recordingId": "<recording-uuid>",
  "dataFormat": {
    "pcm": {
      "sampleRateHz": 16000,
      "bitcount": 16,
      "channels": 1
    }
  },
  "ambientSessionData": {
    "productId": "<product-guid>",
    "partnerId": "<partner-guid>",
    "customerId": "<customer-guid>",
    "correlationId": "<correlation-guid>",
    "practitionerInfo": {
      "externalIdentifiers": [
        { "type": "fhirId", "identifier": "<practitioner-fhir-id>" }
      ],
      "name": {
        "givenName": "Jane",
        "familyName": "Smith",
        "suffix": "MD"
      }
    },
    "externalIdentifiers": [
      { "type": "userId", "identifier": "<external-user-id>" }
    ],
    "creationDate": "2026-02-20T14:30:00.000Z",
    "localeInfo": {
      "recordingLocales": ["en-US"],
      "encounterReportLocale": "en-US",
      "encounterUxLocale": "en-US"
    }
  },
  "actions": ["generate-draft"],
  "reason": "ui",
  "startingOffset": 0
}

Note

RecordingOpen does not produce an immediate response. If the request is invalid, the server closes the connection with an error close code.

Message: DataChunk

Streams a chunk of audio data.

WebSocket message type: Binary

DataChunk messages are sent as binary WebSocket frames containing a JSON-serialized object. They do not use the Path=/X-MS-Request-Id=/X-Timestamp= header format.

Field	Type	Required	Description
`DataStart`	uint32	Yes	Byte offset where this chunk begins within the recording.
`Data`	byte[]	Yes	Raw audio bytes (base64-encoded in JSON representation).

Note

DataChunk uses PascalCase field names (DataStart, Data).

Example (binary frame content):

{
  "DataStart": 0,
  "Data": "<base64-encoded-audio-bytes>"
}

Server response - DataStorageResponse (text message):

Sent approximately every 10 KB of stored data. Not returned for every DataChunk.

{ "dataStored": { "dataStored": 32768 } }

Note

Streaming responses (DataStorageResponse, RecordingCloseResponse) use camelCase field names because they are serialized using protobuf's JSON formatter. This differs from unary responses (RetrieveConfiguration, StartProcessing), which use PascalCase because they are serialized using Newtonsoft.Json.

Note

When streaming resumes after an interruption, the first DataStorageResponse contains the byte position stored prior to the interruption. Duplicate bytes are silently discarded.

Message: RecordingClose

Ends the recording.

Message path: RecordingClose
WebSocket message type: Text

Field	Type	Required	Description
`recordingId`	string	Yes	Must match the `recordingId` from RecordingOpen.
`recordingLengthSeconds`	uint32	Yes	Total recording duration in seconds.
`reason`	string	No	Why the recording was stopped. Values: `"ui"`, `"voiceCommand"`, `"btDisconnected"`, `"externalInterruption"`, `"unexpectedError"`, `"maxDurationExceeded"`.

Example request:

Path=RecordingClose
X-MS-Request-Id=12345678-1234-1234-1234-123456789013
X-Timestamp=2025-08-11T17:17:07.812Z

{
  "recordingId": "<recording-uuid>",
  "recordingLengthSeconds": 120,
  "reason": "ui"
}

Server response - RecordingCloseResponse (text message):

{ "recordingCloses": { "dataStored": 65536 } }

Streaming errors

Scenario	WebSocket close code	Description
Normal completion	1000 (NormalClosure)	Graceful close after RecordingClose.
Invalid message body	2 (InvalidPayloadData)	Cannot parse message as expected format.
Unknown message path	2 (InvalidPayloadData)	Message path not recognized.
Protocol error	1002 (ProtocolError)	Malformed message format.
Internal write failure	1011 (InternalServerError)	`"Resource exhausted please try again later."`
Unexpected server exception	1011 (InternalServerError)	Server error during processing.
Negative `startingOffset`	2 (InvalidPayloadData)	`"StartingOffset cannot be negative"`.

GET /ws/startProcessing

Type: Unary (short-lived WebSocket connection)

Signals Dragon Copilot to begin processing a previously recorded ambient session. Processing happens asynchronously; the response confirms the request was accepted.

Connection lifecycle

1. Client sends HTTP GET /ws/startProcessing with Upgrade header and Authorization token
2. Server validates token and accepts WebSocket upgrade (101 Switching Protocols)
3. Client sends a single text message with Path=StartProcessing and JSON body
4. Server returns the processing response
5. Server closes WebSocket connection (1000 Normal Closure)

Request

Message path: StartProcessing

Field	Type	Required	Description
`ambientSessionData`	object	Yes	Session metadata. Must include `productId`, `partnerId`, `customerId`, and `correlationId`.
`actions`	string[]	Yes	AI actions to perform. Must not be empty. Use `"generate-draft"` for draft generation.
`requestTime`	string (ISO 8601)	No	Timestamp when the user triggered processing.
`recordingsToProcess`	string[]	No	Recording IDs to include. If omitted, all recordings for the session are processed.

Example request:

Path=StartProcessing
X-MS-Request-Id=516c8e09-f9c2-4ff2-8bc9-9d4e89f921fb
X-Timestamp=2025-08-11T19:29:27.608Z

{
  "ambientSessionData": {
    "productId": "<product-guid>",
    "partnerId": "<partner-guid>",
    "customerId": "<customer-guid>",
    "correlationId": "<correlation-guid>",
    "practitionerInfo": {
      "externalIdentifiers": [
        { "type": "fhirId", "identifier": "<practitioner-fhir-id>" }
      ],
      "name": {
        "givenName": "Jane",
        "familyName": "Smith",
        "suffix": "MD"
      }
    },
    "externalIdentifiers": [
      { "type": "userId", "identifier": "<external-user-id>" }
    ]
  },
  "actions": ["generate-draft"],
  "recordingsToProcess": ["<recording-uuid-1>", "<recording-uuid-2>"]
}

Response

The server responds with a text message in the format StartProcessing: {json}.

Field	Type	Description
`StreamingResponse.ErrorCode`	uint32	`0` indicates success. Non-zero indicates an error.
`StreamingResponse.ErrorMessage`	string	Human-readable message.
`StreamingResponse.DetailedErrorInformation`	string	Additional diagnostic information.

Success response:

StartProcessing: {"StreamingResponse":{"ErrorCode":0,"ErrorMessage":"","DetailedErrorInformation":""}}

Error response:

StartProcessing: {"StreamingResponse":{"ErrorCode":1,"ErrorMessage":"Processing failed","DetailedErrorInformation":"Session not found"}}

Note

Unary responses are prefixed with the message path and a colon. Response field names are PascalCase.

Usage notes

actions is required and must not be empty. Use "generate-draft" to trigger AI draft generation.
recordingsToProcess is optional. When omitted, all recordings for the session are processed.
Processing is asynchronous: the response confirms the request was accepted, not that processing is complete.

Errors

Scenario	Result
Missing required fields	Server closes with code 1011 (InternalServerError).
Invalid request body	Server closes with code 1011 (InternalServerError).
Unknown message path	Server closes with code 2 (InvalidPayloadData), reason: `"Unknown message path"`.
Downstream processing failure	`ErrorCode` is non-zero in the response body.
Non-WebSocket request	HTTP 400.
Missing or invalid token	HTTP 401.

Common types

AmbientSession fields

Field	Type	Required	Description
`productId`	string	Yes	Microsoft unique identifier of the product.
`partnerId`	string	Yes	Microsoft unique identifier for the partner.
`customerId`	string	Yes	Microsoft unique identifier of the customer.
`correlationId`	string	Yes	Partner-assigned unique identifier of the session (GUID). Used to correlate results.
`practitionerInfo`	object	No	Practitioner metadata (identifiers, name, specialty).
`ehrInstanceId`	string	No	EHR instance identifier.
`externalIdentifiers`	ExternalIdentifier[]	No	External identifiers. Use type `"userId"` for the partner's user identifier.
`creationDate`	string (ISO 8601)	No	Session creation timestamp.
`dstOffsetSeconds`	int32	No	DST offset in seconds.
`clientInfo`	object	No	Client application metadata (applicationId, applicationVersion, sdkVersion, deviceId, deviceType).
`localeInfo`	object	No	Locale preferences: `recordingLocales`, `encounterReportLocale`, `encounterUxLocale`.

ExternalIdentifier

Field	Type	Description
`type`	string	Identifier type (for example, `"userId"`, `"fhirId"`, `"npi"`, `"encounterId"`).
`identifier`	string	The identifier value.

Best practices

Call retrieveConfiguration before recording to confirm supported locales and duration limits.
Implement reconnection logic with exponential backoff for the bidirectional streaming endpoint.
Track dataStored values to set startingOffset correctly when resuming after interruption.
Include previousEncounterSessions when splitting recordings across multiple sessions.
Call startProcessing after closing the recording. It requires a separate WebSocket connection.
Monitor WebSocket close codes to distinguish normal closure from errors.
Handle backpressure - don't send audio faster than the server can process.
Initiate graceful shutdown - send a close frame rather than abruptly disconnecting.

Last updated on 2026-05-07

Ambient Audio Streaming WebSocket API reference

Authentication

Required headers

Conditionally required headers

Optional headers

Authentication methods

Validation responses

Message format

Endpoints

GET /ws/retrieveConfiguration

Connection lifecycle

Request

Response

Errors

GET /ws

Connection lifecycle

Message: RecordingOpen

Message: DataChunk

Message: RecordingClose

Streaming errors

GET /ws/startProcessing

Connection lifecycle

Request

Response

Usage notes

Errors

Common types

AmbientSession fields

ExternalIdentifier

Best practices

Additional resources