Share via

ACS Real-time Transcription: TranscriptionData stops sending around 604-616 seconds (5 calls reproducible, WebSocket remains open, no Stopped/Failed event)

YSTK 0 Reputation points
2026-05-21T11:33:58.49+00:00

Symptom

Using Azure Communication Services Call Automation Real-time Transcription, after approximately 604–616 seconds (~10 minutes) from metadata accepted, ACS stops sending TranscriptionData over the WebSocket. Observed behavior:

  • The application-side WebSocket (/api/transcription) remains open (no close until the call ends).
  • ACS does not send TranscriptionStopped / TranscriptionFailed / TranscriptionResumed events (verified by webhook handler logging unknown event types as WARN — 0 such events in 48-hour log).
  • The Call Connection itself remains active.
  • After this point, no TranscriptionData (Final or Intermediate) is delivered even when the user speaks.
  • As a result, calls longer than ~10 minutes become effectively unusable for speech recognition.

Reproducibility (5 calls)

| Call | Date (JST) | metadata accepted → last TranscriptionData | Notes |

|---|---|---|---|

| 1 | 2026-05-20 14:28 | 604.0 sec | 12 reconnect cycles during this call due to a separate app-side WebSocket ping issue (now fixed) |

| 2 | 2026-05-21 12:56 | 607.5 sec | Single transcription session (ping fix applied) |

| 3 | 2026-05-21 16:21 | 612.3 sec | Single session, baseline |

| 4 | 2026-05-21 17:43 | 616.3 sec | Attempted update_transcription(locale="ja-JP") (same locale) at 510 sec elapsed — API call succeeded (~0.7 sec) but transcription still stopped |

| 5 | 2026-05-21 19:39 | 611.0 sec | Attempted update_transcription(locale="en-US") → 0.5 sec sleep → update_transcription(locale="ja-JP") (cross-locale 2-step) at 510 sec elapsed — both API calls succeeded (~0.4 sec total) but transcription still stopped |

All five calls reproduce the same behavior at ~604–616 sec, regardless of whether update_transcription is called as a workaround attempt.

Investigation Steps Already Performed

  1. Application side has no relevant timer or stop call: verified by code grep and runtime logs across all five calls.
  2. uvicorn[standard] WebSocket ping/pong disabled (ws_ping_interval=None, ws_ping_timeout=None). After this fix, WebSocket remains as a single session (only 1 metadata accepted event per call) — confirmed in calls 2/3/4/5.
  3. Container Apps system log (Envoy / ingress) at the affected moment: no close event observed (verified for call 2).
  4. stop_transcriptionstart_transcription: rejected by ACS with error code 8523 "Invalid Request, only one start transcription is supported per call at a time".
  5. update_transcription(locale="ja-JP") (same locale) at 510 sec elapsed (Call 4): API call succeeded, but timeout still occurred at 616 sec — no observable side effect on the internal session.
  6. update_transcription(locale="en-US") → sleep → update_transcription(locale="ja-JP") (cross-locale) at 510 sec elapsed (Call 5): both API calls succeeded, but timeout still occurred at 611 sec — cross-locale also has no observable side effect on the internal session.

= Application-side workarounds we attempted (3 patterns) all fail to extend transcription beyond ~10 minutes.

Expected vs Actual

  • Expected: TranscriptionData continues to be sent for the duration of the call (up to the documented 30-hour call limit).
  • Actual: TranscriptionData stops at ~604–616 sec, silently (no TranscriptionStopped event), with WebSocket and call connection still active.

Questions

  1. Is the ~10-minute stop a documented limitation or a bug? If documented, please point to the relevant section of the docs.
  2. If a documented limitation, what is the recommended pattern to continue transcription beyond 10 minutes within the same call?
  3. Given that stop_transcription+start_transcription (8523) and update_transcription (both same and cross locale) all fail to refresh the session, is there any supported API or pattern to renew transcription within a single call?
  4. ACS Real-time Transcription is currently Public Preview. Is there a known GA timeline or roadmap for production support of long-running calls (i.e., calls exceeding 10 minutes)?

Environment

  • Subscription type: Pay-As-You-Go (Subscription ID provided privately if needed)
  • ACS Resource location: global
  • Cognitive Services region: Japan East
  • SDK: azure-communication-callautomation 1.5.0 (Python)
  • azure-core 1.41.0
  • Hosting: Azure Container Apps (japaneast region)
  • Stack: Python 3.12 + FastAPI + uvicorn[standard]
  • TranscriptionOptions:
    • transport_type = WEBSOCKET
    • locale = "ja-JP"
    • start_transcription = False (then explicitly called from on_call_connected, matching the documented default + recommended pattern)
    • enable_intermediate_results = True

Reference: Tutorial alignment

Implementation follows the official Microsoft Learn tutorial:

https://learn.microsoft.com/en-us/azure/communication-services/how-tos/call-automation/real-time-transcription-tutorial

Specifically, startTranscription=False followed by explicit startTranscription() after CallConnected matches the documented default + recommended pattern.

Logs / Evidence

Application-side logs (Container Apps Console Logs) and ACS Diagnostic Logs (CallAutomationOperational, CallAutomationMediaSummary, CallAutomationStreamingUsage) can be provided privately. Call Connection IDs of the five reproducing calls are also available privately.

Note: We enabled CallAutomationStreamingUsage Diagnostic Settings on 2026-05-21, but the table still shows 0 rows ~5 hours after enabling. We cannot independently verify the StreamingDurationInMs value at this time.

Azure Communication Services

2 answers

Sort by: Most helpful
  1. Golla Venkata Pavani 5,400 Reputation points Microsoft External Staff Moderator
    2026-05-21T13:22:40.6866667+00:00

    Hi @YSTK

    Thank you for reaching us regarding the issue.

    1. Is the ~10-minute stop a documented limitation or a bug? If documented, please point to the relevant section of the docs.
      Microsoft documentation does not mention any 10-minute (approximately 600–616 second) session timeout or hard limit for Real-time Transcription in Azure Communication Services Call Automation. The feature is expected to provide continuous transcription for the full duration of the call, up to the maximum call lifetime of 30 hours.

    The silent discontinuation of TranscriptionData (without TranscriptionStopped or TranscriptionFailed events) while the WebSocket and call remain active is not described as expected behavior.

    1. If a documented limitation, what is the recommended pattern to continue transcription beyond 10 minutes within the same call?
      Since this is not documented as a limitation, there is no official recommended pattern provided for handling a 10-minute cutoff.

    Documentation assumes that once started, real-time transcription continues delivering TranscriptionData for the entire call duration.

    1. Given that stop_transcription+start_transcription (8523) and update_transcription (both same and cross locale) all fail to refresh the session, is there any supported API or pattern to renew transcription within a single call?
      No, there is currently no supported API or documented pattern to renew or refresh a transcription session within a single call.
    • StopTranscription followed by StartTranscription is not supported while a transcription is active (consistent with error 8523).
    • UpdateTranscription is only documented for changing the locale or language detection settings, it does not reset or extend the internal transcription session.

    User's image

    1. ACS Real-time Transcription is currently Public Preview. Is there a known GA timeline or roadmap for production support of long-running calls (i.e., calls exceeding 10 minutes)?
      Real-time Transcription is still in Public Preview. There is no publicly announced GA date or specific roadmap timeline for full production support of long-running calls.

    Reference:
    https://learn.microsoft.com/en-us/azure/communication-services/concepts/call-automation/real-time-transcription
    https://learn.microsoft.com/en-us/azure/communication-services/how-tos/call-automation/real-time-transcription-tutorial
    https://learn.microsoft.com/en-us/azure/communication-services/whats-new

    Kindly let us know if the above helps or you need further assistance on this issue.

    Please "accept" if the information helped you. This will help us and others in the community as well.

    Was this answer helpful?

    0 comments No comments

  2. AI answer

    2026-05-21T11:34:24.7633333+00:00

    The provided documentation and Q&A context do not describe any 10‑minute limit, timeout, or auto‑stop behavior for Azure Communication Services Call Automation real-time transcription.

    From the available content:

    • Real-time transcription is documented as a continuous feature that can be started with startTranscription and updated with updateTranscription (including locale and optional custom speech model endpoint), and stopped explicitly with stopTranscription.
    • Known limitations that are documented include:
      • OperationContext not updating correctly when updateTranscription is called (the TranscriptionUpdated event continues to show the original operationContext).
      • A limitation for 1:1 calls with ACS users using Client SDKs where startTranscription = True is not supported.
    • Event codes are documented for failure scenarios (for example, Cognitive Services authentication errors, forbidden requests, concurrent request limits, WebSocket connection failures, service shutdown, and unknown internal errors). These are surfaced as TranscriptionFailed events with specific codes/subcodes.
    • The tutorial and conceptual docs show that:
      • Transcription is started via startTranscription with options such as locale, operationContext, and optional AI capabilities.
      • Transcription can be updated via updateTranscription (locale and optional speech model endpoint) and stopped via stopTranscription.
      • There is no mention of a fixed maximum duration for a single transcription session (such as 10 minutes) or a requirement to periodically renew/refresh transcription.

    Given this, the questions can only be answered as follows based on the context:

    1. There is no documented ~10‑minute stop/timeout limit for real-time transcription in the provided documentation. The only documented limitations relate to operationContext not updating correctly and a specific unsupported 1:1 scenario.
    2. Because no such 10‑minute limit is documented in the context, there is no documented pattern in the provided material for extending or renewing transcription beyond 10 minutes within the same call.
    3. The only supported APIs described for controlling transcription are:
      • startTranscription (to start transcription)
      • updateTranscription (to change locale and optionally specify a custom speech recognition model endpoint)
      • stopTranscription (to stop transcription)
      The context does not describe any additional or special pattern/API to “renew” or “refresh” transcription within a single call beyond these operations.
    4. The context only states that some AI capabilities and real-time transcription features are in Preview (for example, AI capabilities on the transcription stream, real-time language detection, mid-call summaries). It does not provide any GA timeline, roadmap, or specific commitment for production support of long-running calls.

    For an issue like a silent stop of TranscriptionData around 10 minutes with no TranscriptionStopped/TranscriptionFailed events and an open WebSocket, the behavior is not covered by the documented limitations or event codes in the provided context. Escalation through Azure Support with call connection IDs and diagnostic logs would be required to get an authoritative answer and potential fix.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.