Speech to text and speech analysis for videos and video streams.

alex 1 Reputation point
2022-03-29T11:49:18.44+00:00

Hello there.
I am working on a platform where I need analysis of what people talk in videos and streams (using the Extract key phrases, Named Entity Recognition (NER) and Find linked entities from the Azure Text Analytics service).
So far for the videos, I am able to get the audio out of the them, but they are really huge (in size and length).
My plan was to get the audio out of the videos then send it to Speech To Text Azure service, get the transcription back and run the transcription thru the Text Analytics Azure service.
Is that the right approach? Is there any better way to do this? Should I use Batch transcription or Speech SDK? Where can I find such examples?
Whats the best way to do the same for the live streams?
I am reading the documentation of the Speech To Text services but I really cant grasp enough from the documentation to know how to do this. I am more confused after wasting like 3 days on the documentation than before.
Thanks in advance and best wishes.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,069 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,619 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,826 Reputation points
    2022-03-30T10:27:00.223+00:00

    @@alex Thanks, With Video Indexer you can upload audio files using Video Indexer Studio and See sample here for media services video indexer.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.