Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The Ambient Audio Streaming (AAS) 2.0 WebSocket API enables partners to stream ambient audio recordings in real time and submit them for downstream processing by Dragon Copilot.
The WebSocket transport is one of two streaming options (alongside gRPC). It exposes three endpoints:
| Endpoint | Type | Description |
|---|---|---|
| GET /ws/retrieveConfiguration | Unary | Returns the service configuration for a given partner, including supported audio formats, locale settings, and operational limits. |
| GET /ws | Bidirectional streaming | Streams an ambient audio recording to the server. The client opens a session, streams audio data, and closes the recording. |
| GET /ws/startProcessing | Unary | Signals Dragon Copilot to begin processing a previously recorded ambient session. |
Authentication
All WebSocket endpoints require bearer token authentication. Tokens can be passed via the Authorization header or the Sec-WebSocket-Protocol subprotocol header.
Supported token types:
- S2S (Server-to-Server): Machine-to-machine token issued via MISE. After authentication, the service validates the calling application's identity against a configured allowlist.
- Entra ID User Token: User-delegated token issued by Microsoft Entra ID.
- EIS Bearer Token: JWT issued by the EHR Integration Service (EIS). See Token launch integration for details.
Required headers
| Header | Description |
|---|---|
Authorization |
Bearer token (Bearer <token>). Alternatively, pass via Sec-WebSocket-Protocol. |
customer-id |
Customer/environment identifier. Returns 403 Forbidden if missing. |
Conditionally required headers
| Header | Condition | Description |
|---|---|---|
user-guid or external-user-id |
When using M2M (S2S) token | At least one must be provided. Returns 403 Forbidden if both are missing. |
Optional headers
| Header | Description |
|---|---|
product-id |
Product identifier (used for license validation context). |
Authentication methods
Method 1: Authorization header
GET /ws HTTP/1.1
Host: ambient-audio-service.copilot.us.dragon.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Version: 13
Authorization: Bearer <token>
customer-id: <customer-uuid>
external-user-id: <user-id>
Method 2: Sec-WebSocket-Protocol
The browser WebSocket API (new WebSocket(url, protocols)) does not allow setting custom headers like Authorization. To pass authentication data during the WebSocket handshake from browser-based clients, use the Sec-WebSocket-Protocol header as a comma-delimited key-value list.
The service supports two subprotocol formats:
Format A: Simple Bearer prefix
Pass the token directly with a Bearer prefix. Use this when you can set customer-id and other headers separately (for example, in non-browser environments that support subprotocols but not custom headers on upgrade):
GET /ws HTTP/1.1
Host: streaming.ambient-audio-service.copilot.us.dragon.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: Bearer <token>
customer-id: <customer-uuid>
external-user-id: <user-id>
Format B: Key-value list (browser SDK)
Pass the token and customer-id together in the subprotocol list. Use this when you cannot set any custom headers (for example, the browser WebSocket API):
GET /ws HTTP/1.1
Host: streaming.ambient-audio-service.copilot.us.dragon.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: sec-websocket-protocol, <token>, customer-id, <customer-uuid>
Parsing rules for Format B:
| Position | Value | Description |
|---|---|---|
| 0 | sec-websocket-protocol |
Key marker. Also echoed in the server's response header for WebSocket handshake compliance. |
| 1 | <token> |
The raw JWT token (without Bearer prefix). |
| 2 | customer-id |
Key for customer/environment identifier. |
| 3 | <customer-uuid> |
Value of customer-id. |
- Values are comma-delimited and trimmed of whitespace.
- The server locates keys by name (not strictly by position), then reads the next value as the associated value.
- Only
customer-idcan be passed in the subprotocol list. Theuser-guid,external-user-id, andproduct-idheaders must still be sent as separate HTTP headers on the upgrade request.
Note
In JavaScript browser clients, each value is passed as a separate subprotocol: new WebSocket(url, ["sec-websocket-protocol", token, "customer-id", customerId]). The browser serializes these as the comma-separated Sec-WebSocket-Protocol header.
Validation responses
| Scenario | Result |
|---|---|
| Missing valid token | 401 Unauthorized |
Missing customer-id |
403 Forbidden |
Missing both user-guid and external-user-id with M2M token |
403 Forbidden |
| Failed license check | 403 Forbidden |
| Valid token and headers | WebSocket upgrade proceeds (101 Switching Protocols) |
Message format
All text-based WebSocket messages use the following header format:
Path=<message-path>
X-MS-Request-Id=<guid-request-id>
X-Timestamp=<iso8601-utc-timestamp>
{JSON body}
Format rules:
- Headers are separated by
\r\n(CRLF) - Headers and body are separated by a blank line (
\r\n\r\n) - Valid message paths:
RetrieveConfiguration,RecordingOpen,RecordingClose,StartProcessing X-MS-Request-Idmust be a valid GUIDX-Timestampmust be ISO 8601 UTC format (for example,2025-08-11T16:45:00.547Z)
Example message:
Path=RecordingOpen
X-MS-Request-Id=12345678-1234-1234-1234-123456789012
X-Timestamp=2025-08-11T16:45:00.547Z
{ "recordingId": "...", "dataFormat": {...}, ... }
Endpoints
GET /ws/retrieveConfiguration
Type: Unary (short-lived WebSocket connection)
Retrieves the service configuration for a given product, partner, and customer. The response includes supported audio locales and recording duration limits. Call this before starting a recording session.
Connection lifecycle
1. Client sends HTTP GET /ws/retrieveConfiguration with Upgrade header and Authorization token
2. Server validates token and accepts WebSocket upgrade (101 Switching Protocols)
3. Client sends a single text message with Path=RetrieveConfiguration and JSON body
4. Server returns the configuration response
5. Server closes WebSocket connection (1000 Normal Closure)
Request
Message path: RetrieveConfiguration
| Field | Type | Required | Description |
|---|---|---|---|
productId |
string | Yes | The Microsoft unique identifier of the product. |
partnerId |
string | Yes | The Microsoft unique identifier for the partner. |
customerId |
string | Yes | The Microsoft unique identifier of the customer. |
externalIdentifiers |
ExternalIdentifier[] | No | List of external identifiers. Use type "userId" for the partner's user identifier. |
Example request:
Path=RetrieveConfiguration
X-MS-Request-Id=7a1223d0-272b-4f14-9f5c-e2ff5efd775e
X-Timestamp=2025-08-10T17:49:49.739Z
{
"productId": "<product-guid>",
"partnerId": "<partner-guid>",
"customerId": "<customer-guid>",
"externalIdentifiers": [
{ "type": "userId", "identifier": "<external-user-id>" }
]
}
Response
The server responds with a text message in the format RetrieveConfiguration: {json}.
| Field | Type | Description |
|---|---|---|
EncounterWarnSeconds |
uint32 | Duration in seconds at which processing quality may degrade. Warn the user. |
EncounterMaxSeconds |
uint32 | Maximum duration in seconds of audio allowed. Stop recording at this limit. |
SupportedRecordingLocales |
string[] | Locales accepted for audio recording input (IETF BCP 47). |
SupportedEncounterReportLocales |
string[] | Locales available for encounter report output (IETF BCP 47). |
Example response:
RetrieveConfiguration: {"EncounterWarnSeconds":2700,"EncounterMaxSeconds":4500,"SupportedRecordingLocales":["en-US","de-DE","es-US","fr-FR"],"SupportedEncounterReportLocales":["en-US","de-DE","es-US","fr-FR"]}
Note
Unary responses are prefixed with the message path and a colon (for example, RetrieveConfiguration: {...}). Response field names are PascalCase.
Errors
| Scenario | Result |
|---|---|
| Invalid or missing required fields | Server closes with code 1011 (InternalServerError). |
| Unknown message path | Server closes with code 2 (InvalidPayloadData), reason: "Unknown message path". |
| Failure communicating with downstream system | Server closes with code 1011 (InternalServerError). |
| Non-WebSocket request | HTTP 400. |
| Missing or invalid token | HTTP 401. |
GET /ws
Type: Bidirectional streaming (long-lived WebSocket connection)
Streams an ambient audio recording to the server. The connection supports three client message types:
- RecordingOpen (text) - Initialize the recording session.
- DataChunk (binary) - Stream audio data.
- RecordingClose (text) - End the recording.
The server responds with DataStorageResponse acknowledgments and a final RecordingCloseResponse.
Connection lifecycle
1. Client sends HTTP GET /ws with Upgrade header and Authorization token
2. Server validates token and accepts WebSocket upgrade (101 Switching Protocols)
3. Client sends RecordingOpen (text message) to initialize the recording session
4. Client sends DataChunk messages (binary) to stream audio data
5. Server sends DataStorageResponse (text) approximately every 10 KB of stored data
6. Client sends RecordingClose (text message) to end the recording
7. Server sends RecordingCloseResponse (text) confirming total bytes stored
8. Connection closes (1000 Normal Closure)
Message: RecordingOpen
Initializes the recording session. Must be the first message sent on the connection.
Message path: RecordingOpen
WebSocket message type: Text
| Field | Type | Required | Description |
|---|---|---|---|
recordingId |
string | Yes | A caller-defined unique identifier for the recording. |
dataFormat |
object | Yes | Audio encoding format. Must contain exactly one of: pcm, opus, webmOpus, or byteStream. |
dataFormat.pcm |
object | - | Signed 16-bit LE PCM. Fields: sampleRateHz, bitcount, channels. |
dataFormat.opus |
object | - | Ogg Opus. Fields: sampleRateHz. |
dataFormat.webmOpus |
object | - | WebM Opus. Fields: sampleRateHz. |
dataFormat.byteStream |
object | - | Opaque byte stream. Fields: formatSpecifier. |
ambientSessionData |
object | Yes | Session metadata. Must include productId, partnerId, customerId, and correlationId. |
actions |
string[] | No | AI actions to perform. Use "generate-draft" for draft generation. If omitted, only a transcript is generated. |
reason |
string | No | Why the recording was started. Values: "ui", "wakeWord", "systemResume". |
startingOffset |
uint32 | No | Byte offset for resuming after interruption. Set to last confirmed dataStored value. |
previousEncounterSessions |
object[] | No | Previous sessions in the encounter. Each: sessionId, creationDate, sessionLengthSeconds. |
outputFormIds |
string[] | No | Template form identifiers for note generation. |
Example request:
Path=RecordingOpen
X-MS-Request-Id=12345678-1234-1234-1234-123456789012
X-Timestamp=2025-08-11T16:45:00.547Z
{
"recordingId": "<recording-uuid>",
"dataFormat": {
"pcm": {
"sampleRateHz": 16000,
"bitcount": 16,
"channels": 1
}
},
"ambientSessionData": {
"productId": "<product-guid>",
"partnerId": "<partner-guid>",
"customerId": "<customer-guid>",
"correlationId": "<correlation-guid>",
"practitionerInfo": {
"externalIdentifiers": [
{ "type": "fhirId", "identifier": "<practitioner-fhir-id>" }
],
"name": {
"givenName": "Jane",
"familyName": "Smith",
"suffix": "MD"
}
},
"externalIdentifiers": [
{ "type": "userId", "identifier": "<external-user-id>" }
],
"creationDate": "2026-02-20T14:30:00.000Z",
"localeInfo": {
"recordingLocales": ["en-US"],
"encounterReportLocale": "en-US",
"encounterUxLocale": "en-US"
}
},
"actions": ["generate-draft"],
"reason": "ui",
"startingOffset": 0
}
Note
RecordingOpen does not produce an immediate response. If the request is invalid, the server closes the connection with an error close code.
Message: DataChunk
Streams a chunk of audio data.
WebSocket message type: Binary
DataChunk messages are sent as binary WebSocket frames containing a JSON-serialized object. They do not use the Path=/X-MS-Request-Id=/X-Timestamp= header format.
| Field | Type | Required | Description |
|---|---|---|---|
DataStart |
uint32 | Yes | Byte offset where this chunk begins within the recording. |
Data |
byte[] | Yes | Raw audio bytes (base64-encoded in JSON representation). |
Note
DataChunk uses PascalCase field names (DataStart, Data).
Example (binary frame content):
{
"DataStart": 0,
"Data": "<base64-encoded-audio-bytes>"
}
Server response - DataStorageResponse (text message):
Sent approximately every 10 KB of stored data. Not returned for every DataChunk.
{ "dataStored": { "dataStored": 32768 } }
Note
Streaming responses (DataStorageResponse, RecordingCloseResponse) use camelCase field names because they are serialized using protobuf's JSON formatter. This differs from unary responses (RetrieveConfiguration, StartProcessing), which use PascalCase because they are serialized using Newtonsoft.Json.
Note
When streaming resumes after an interruption, the first DataStorageResponse contains the byte position stored prior to the interruption. Duplicate bytes are silently discarded.
Message: RecordingClose
Ends the recording.
Message path: RecordingClose
WebSocket message type: Text
| Field | Type | Required | Description |
|---|---|---|---|
recordingId |
string | Yes | Must match the recordingId from RecordingOpen. |
recordingLengthSeconds |
uint32 | Yes | Total recording duration in seconds. |
reason |
string | No | Why the recording was stopped. Values: "ui", "voiceCommand", "btDisconnected", "externalInterruption", "unexpectedError", "maxDurationExceeded". |
Example request:
Path=RecordingClose
X-MS-Request-Id=12345678-1234-1234-1234-123456789013
X-Timestamp=2025-08-11T17:17:07.812Z
{
"recordingId": "<recording-uuid>",
"recordingLengthSeconds": 120,
"reason": "ui"
}
Server response - RecordingCloseResponse (text message):
{ "recordingCloses": { "dataStored": 65536 } }
Streaming errors
| Scenario | WebSocket close code | Description |
|---|---|---|
| Normal completion | 1000 (NormalClosure) | Graceful close after RecordingClose. |
| Invalid message body | 2 (InvalidPayloadData) | Cannot parse message as expected format. |
| Unknown message path | 2 (InvalidPayloadData) | Message path not recognized. |
| Protocol error | 1002 (ProtocolError) | Malformed message format. |
| Internal write failure | 1011 (InternalServerError) | "Resource exhausted please try again later." |
| Unexpected server exception | 1011 (InternalServerError) | Server error during processing. |
Negative startingOffset |
2 (InvalidPayloadData) | "StartingOffset cannot be negative". |
GET /ws/startProcessing
Type: Unary (short-lived WebSocket connection)
Signals Dragon Copilot to begin processing a previously recorded ambient session. Processing happens asynchronously; the response confirms the request was accepted.
Connection lifecycle
1. Client sends HTTP GET /ws/startProcessing with Upgrade header and Authorization token
2. Server validates token and accepts WebSocket upgrade (101 Switching Protocols)
3. Client sends a single text message with Path=StartProcessing and JSON body
4. Server returns the processing response
5. Server closes WebSocket connection (1000 Normal Closure)
Request
Message path: StartProcessing
| Field | Type | Required | Description |
|---|---|---|---|
ambientSessionData |
object | Yes | Session metadata. Must include productId, partnerId, customerId, and correlationId. |
actions |
string[] | Yes | AI actions to perform. Must not be empty. Use "generate-draft" for draft generation. |
requestTime |
string (ISO 8601) | No | Timestamp when the user triggered processing. |
recordingsToProcess |
string[] | No | Recording IDs to include. If omitted, all recordings for the session are processed. |
Example request:
Path=StartProcessing
X-MS-Request-Id=516c8e09-f9c2-4ff2-8bc9-9d4e89f921fb
X-Timestamp=2025-08-11T19:29:27.608Z
{
"ambientSessionData": {
"productId": "<product-guid>",
"partnerId": "<partner-guid>",
"customerId": "<customer-guid>",
"correlationId": "<correlation-guid>",
"practitionerInfo": {
"externalIdentifiers": [
{ "type": "fhirId", "identifier": "<practitioner-fhir-id>" }
],
"name": {
"givenName": "Jane",
"familyName": "Smith",
"suffix": "MD"
}
},
"externalIdentifiers": [
{ "type": "userId", "identifier": "<external-user-id>" }
]
},
"actions": ["generate-draft"],
"recordingsToProcess": ["<recording-uuid-1>", "<recording-uuid-2>"]
}
Response
The server responds with a text message in the format StartProcessing: {json}.
| Field | Type | Description |
|---|---|---|
StreamingResponse.ErrorCode |
uint32 | 0 indicates success. Non-zero indicates an error. |
StreamingResponse.ErrorMessage |
string | Human-readable message. |
StreamingResponse.DetailedErrorInformation |
string | Additional diagnostic information. |
Success response:
StartProcessing: {"StreamingResponse":{"ErrorCode":0,"ErrorMessage":"","DetailedErrorInformation":""}}
Error response:
StartProcessing: {"StreamingResponse":{"ErrorCode":1,"ErrorMessage":"Processing failed","DetailedErrorInformation":"Session not found"}}
Note
Unary responses are prefixed with the message path and a colon. Response field names are PascalCase.
Usage notes
actionsis required and must not be empty. Use"generate-draft"to trigger AI draft generation.recordingsToProcessis optional. When omitted, all recordings for the session are processed.- Processing is asynchronous: the response confirms the request was accepted, not that processing is complete.
Errors
| Scenario | Result |
|---|---|
| Missing required fields | Server closes with code 1011 (InternalServerError). |
| Invalid request body | Server closes with code 1011 (InternalServerError). |
| Unknown message path | Server closes with code 2 (InvalidPayloadData), reason: "Unknown message path". |
| Downstream processing failure | ErrorCode is non-zero in the response body. |
| Non-WebSocket request | HTTP 400. |
| Missing or invalid token | HTTP 401. |
Common types
AmbientSession fields
| Field | Type | Required | Description |
|---|---|---|---|
productId |
string | Yes | Microsoft unique identifier of the product. |
partnerId |
string | Yes | Microsoft unique identifier for the partner. |
customerId |
string | Yes | Microsoft unique identifier of the customer. |
correlationId |
string | Yes | Partner-assigned unique identifier of the session (GUID). Used to correlate results. |
practitionerInfo |
object | No | Practitioner metadata (identifiers, name, specialty). |
ehrInstanceId |
string | No | EHR instance identifier. |
externalIdentifiers |
ExternalIdentifier[] | No | External identifiers. Use type "userId" for the partner's user identifier. |
creationDate |
string (ISO 8601) | No | Session creation timestamp. |
dstOffsetSeconds |
int32 | No | DST offset in seconds. |
clientInfo |
object | No | Client application metadata (applicationId, applicationVersion, sdkVersion, deviceId, deviceType). |
localeInfo |
object | No | Locale preferences: recordingLocales, encounterReportLocale, encounterUxLocale. |
ExternalIdentifier
| Field | Type | Description |
|---|---|---|
type |
string | Identifier type (for example, "userId", "fhirId", "npi", "encounterId"). |
identifier |
string | The identifier value. |
Best practices
- Call retrieveConfiguration before recording to confirm supported locales and duration limits.
- Implement reconnection logic with exponential backoff for the bidirectional streaming endpoint.
- Track
dataStoredvalues to setstartingOffsetcorrectly when resuming after interruption. - Include
previousEncounterSessionswhen splitting recordings across multiple sessions. - Call startProcessing after closing the recording. It requires a separate WebSocket connection.
- Monitor WebSocket close codes to distinguish normal closure from errors.
- Handle backpressure - don't send audio faster than the server can process.
- Initiate graceful shutdown - send a close frame rather than abruptly disconnecting.