Voice/Speech to Text Train Model

Question

Voice/Speech to Text Train Model

Nathan Carns 196

Hi.

So I would like to create a model that 'listens' to audio from movies/podcasts (with subtitles) then returns the text transcript from it. Problem is, it's in a language not supported by Azure (or most of the big cloud providers). How would I go about and, from scratch, build a model that is trained on the audio from a new language? The input audio all will have subtitles or captions.

I tried Azure ML studio but I couldn't create datasets with audio files. Not sure if I missed something there. Also tried Speech studio but it only supports a select number of languages. Would that be possible at all?

Any suggestions would be appreciated. Thanks.

YutongTie-MSFT 53,971 Reputation points Moderator

2021-07-27T07:16:06.897+00:00

@Nathan Carns Thanks for reaching out to us. So the main idea of your project is to train model with speech input. Could you please share the input languages and your training method? I don't think Azure ML studio(classic) is supporting this scenario, but I would like to learn more to see if we have anything to help.

Regards,
Yutong
Nathan Carns 196 Reputation points

2021-07-27T11:32:11.093+00:00

Thank you for the reply.

The audio input will be in common Nigerian languages (yoruba, igbo hausa). There isn't speech to text support as far as I researched. I want to basically input audio files to the model and it generates transcripts/text. We'll train the model on audio files containing subtitles/captions (separate) and when we have to validate the data, we input audio files and it outputs the corresponding text.

Not sure how feasible this is. Might it require us to develop a deep learning model?

Accepted answer

0 additional answers

Your answer

YutongTie-MSFT 53,971 Reputation points Moderator

2021-07-27T07:16:06.897+00:00

@Nathan Carns Thanks for reaching out to us. So the main idea of your project is to train model with speech input. Could you please share the input languages and your training method? I don't think Azure ML studio(classic) is supporting this scenario, but I would like to learn more to see if we have anything to help.

Regards,
Yutong
Nathan Carns 196 Reputation points

2021-07-27T11:32:11.093+00:00

Thank you for the reply.

The audio input will be in common Nigerian languages (yoruba, igbo hausa). There isn't speech to text support as far as I researched. I want to basically input audio files to the model and it generates transcripts/text. We'll train the model on audio files containing subtitles/captions (separate) and when we have to validate the data, we input audio files and it outputs the corresponding text.

Not sure how feasible this is. Might it require us to develop a deep learning model?

Answer 1

YutongTie-MSFT 53,971 Moderator

@Nathan Carns Yes, you are correct, to develop a model for speech to text we need a deep learning model here. This is out of the scope of Azure Machine Learning Studio(classic). But I think Azure Machine Learning service should support it, please refer to this: https://learn.microsoft.com/en-us/azure/machine-learning/concept-deep-learning-vs-machine-learning#machine-translation

I have found one post which may help: https://towardsdatascience.com/audio-deep-learning-made-simple-automatic-speech-recognition-asr-how-it-works-716cfce4c706

Moreover, I have forwarded your feedback to see any plan here for Nigerian in Azure.

Thanks.
Yutong

Nathan Carns 196 Reputation points

2021-08-06T16:16:37.38+00:00

Thank you so much for this. I will look into Azure ML and the links you provided me. Please let me know if you get any response regarding the Nigerian languages.

Share via

Voice/Speech to Text Train Model

0 additional answers

Your answer