Share via

Documentation question: Does Microsoft Stream transcription on SharePoint use Azure AI Speech?

Catherine M. Stockdale 0 Reputation points
2026-05-05T02:38:37.7733333+00:00

Hello,

I'm working on a research project that requires me to accurately describe the technology stack used for transcription within Microsoft 365, and I'm hoping you can help me clarify a technical question that I have not been able to confirm through public documentation.

My specific questions are:

  1. Does the transcription functionality in Microsoft Stream — which generates transcripts and closed captions for videos uploaded to SharePoint — use Azure AI Speech as its underlying automatic speech recognition (ASR) engine?
  2. Is there an existing Microsoft document or technical reference that describes how transcripts are generated for videos uploaded to SharePoint, including the underlying technology stack?

I have found Microsoft documentation confirming that:

However, I have not been able to find documentation explicitly confirming that SharePoint uses Azure AI Speech as its ASR engine, nor have I found a comprehensive reference describing the technology behind SharePoint's transcript generation. If you can confirm this connection or point me to relevant documentation, I would greatly appreciate it. If this information is not publicly available, a brief note to that effect would also be helpful.

Thank you for your time.

Azure Speech in Foundry Tools
0 comments No comments

2 answers

Sort by: Most helpful
  1. SRILAKSHMI C 18,990 Reputation points Microsoft External Staff Moderator
    2026-05-05T15:51:27.0333333+00:00

    Hello @Catherine M. Stockdale

    Thanks for your detailed question

    Does Stream (on SharePoint) use Azure AI Speech?

    From a platform perspective, yes Microsoft Stream (on SharePoint) transcription is powered by Microsoft’s cloud-based speech recognition services, which are part of the same underlying platform as Azure AI Speech.

    This is the same core “Speech-to-Text” technology that powers experiences across:

    • Microsoft Teams (live captions and transcription)
    • Microsoft 365 (dictation in Word/Office)
    • Microsoft Edge (Read Aloud)

    Additionally:

    Stream does not use a custom on-premises ASR engine

    It relies on Microsoft-hosted speech services running in Azure for automatic transcription and caption generation

    Documentation

    While the underlying technology is shared:

    Microsoft does not explicitly document that Microsoft Stream directly consumes Azure AI Speech as a customer-facing API or dependency.

    Instead, Stream uses Microsoft’s internal speech services, which are built on the same speech platform that powers Azure AI Speech. These internal services may differ from the publicly exposed APIs in terms of implementation and integration.

    Is there a detailed architecture or pipeline document?

    At present, there is no publicly available deep-dive or end-to-end architecture document specifically for Stream transcription in SharePoint.

    What is available:

    Documentation confirming that Microsoft Stream:

    • Automatically generates transcripts
    • Produces WebVTT (.vtt) caption files

    The “Updates to transcript functionality in Microsoft Stream” article (functional overview)

    General documentation for Azure AI Speech.

    What is not available:

    • Detailed transcription pipeline for Stream

    Architecture diagrams for SharePoint video processing

    Explicit confirmation of direct Azure AI Speech API usage

    Please refer this

    Updates to transcript functionality in Microsoft Stream https://learn.microsoft.com/stream/transcript-functionality-updates

    What Is Azure Speech? (Azure Cognitive Services Speech Service overview) https://learn.microsoft.com/azure/ai-services/speech-service/overviewI Hope this helps. Do let me know if you have any further queries.


    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!

    Was this answer helpful?


  2. Vinodh247-1375 42,776 Reputation points Volunteer Moderator
    2026-05-05T15:50:05.4766667+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    Short answer: There is no public ms documentation that explicitly confirms that ms Stream (on SharePoint) transcription uses Azure AI Speech as its underlying ASR engine.

    explanation: While Azure AI Speech is officially documented as the speech-to-text engine behind services like Teams captions, dictation, and Edge Read Aloud, Microsoft has not published a technical reference that directly ties SharePoint/Stream (on SharePoint) transcript generation to that same service. Stream documentation confirms automatic transcript and VTT generation, but the underlying implementation is abstracted and described only as part of Microsoft 365 intelligent services. In practice, it is highly likely that Azure AI Speech or a closely related internal variant is used, but from a research or documentation standpoint, you should state that the exact ASR backend for SharePoint video transcription is not publicly disclosed by Microsoft.

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.