Voice/Speech to Text Train Model

Nathan Carns 196 Reputation points
2021-07-26T20:15:07.383+00:00

Hi.

So I would like to create a model that 'listens' to audio from movies/podcasts (with subtitles) then returns the text transcript from it. Problem is, it's in a language not supported by Azure (or most of the big cloud providers). How would I go about and, from scratch, build a model that is trained on the audio from a new language? The input audio all will have subtitles or captions.

I tried Azure ML studio but I couldn't create datasets with audio files. Not sure if I missed something there. Also tried Speech studio but it only supports a select number of languages. Would that be possible at all?

Any suggestions would be appreciated. Thanks.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,067 questions
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,337 questions
{count} votes

Accepted answer
  1. YutongTie-MSFT 53,966 Reputation points Moderator
    2021-08-04T23:39:40.253+00:00

    @Nathan Carns Yes, you are correct, to develop a model for speech to text we need a deep learning model here. This is out of the scope of Azure Machine Learning Studio(classic). But I think Azure Machine Learning service should support it, please refer to this: https://learn.microsoft.com/en-us/azure/machine-learning/concept-deep-learning-vs-machine-learning#machine-translation

    I have found one post which may help: https://towardsdatascience.com/audio-deep-learning-made-simple-automatic-speech-recognition-asr-how-it-works-716cfce4c706

    Moreover, I have forwarded your feedback to see any plan here for Nigerian in Azure.

    Thanks.
    Yutong


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.