Speech to text and speech analysis for videos and video streams.

Question

Speech to text and speech analysis for videos and video streams.

alex 1

Hello there.
I am working on a platform where I need analysis of what people talk in videos and streams (using the Extract key phrases, Named Entity Recognition (NER) and Find linked entities from the Azure Text Analytics service).
So far for the videos, I am able to get the audio out of the them, but they are really huge (in size and length).
My plan was to get the audio out of the videos then send it to Speech To Text Azure service, get the transcription back and run the transcription thru the Text Analytics Azure service.
Is that the right approach? Is there any better way to do this? Should I use Batch transcription or Speech SDK? Where can I find such examples?
Whats the best way to do the same for the live streams?
I am reading the documentation of the Speech To Text services but I really cant grasp enough from the documentation to know how to do this. I am more confused after wasting like 3 days on the documentation than before.
Thanks in advance and best wishes.

Ramr-msft 17,826 Reputation points

2022-03-30T10:26:43.043+00:00

@alex Thanks for the question. Can you please add more details about the usecase that you are trying.

1 answer

Your answer

Ramr-msft 17,826 Reputation points

2022-03-30T10:26:43.043+00:00

@alex Thanks for the question. Can you please add more details about the usecase that you are trying.

Answer 1

Ramr-msft 17,826

@@alex Thanks, With Video Indexer you can upload audio files using Video Indexer Studio and See sample here for media services video indexer.

alex 1 Reputation point

2022-03-31T09:37:21.877+00:00

thanks @Ramr-msft for the suggestion and your help. i am trying it at the moment, i will let you know what i will end up with.
alex 1 Reputation point

2022-03-31T10:44:12.227+00:00

i have to say, the documentation is really really bad... many links in the documentation return 404, also the lack of examples is really bad... this is the main reason why i hate working with anything Microsoft and Amazon, both have bad documentations...

Share via

Speech to text and speech analysis for videos and video streams.

1 answer

Your answer